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Introduction 


1.1  Overview 

The  MIT  Laboratory  for  Computer  Science  (LCS)  is  an  interdepartmental  laboratory  whose 
principal  goal  is  research  in  computer  science  and  engineering. 

Founded  as  Project  MAC  in  1963,  the  Laboratory  developed  one  of  the  world’s  earliest 
timeshared  computer  systems.  This  early  research  on  the  Compatible  Time  Sharing  System 
(CTSS)  and  its  successor,  MULTICS,  made  possible  innovative  developments  such  as  the 
writing  of  operating  systems  in  high  level  programming  languages,  virtual  memory,  tree 
directories,  online  scheduling  algorithms,  line  and  page  editors,  secure  operating  systems, 
concepts  and  techniques  for  access  control,  computer-aided  design,  and  two  of  the  earliest 
computer  games — space  wars  and  computer  chess. 

These  early  developments  laid  the  foundation  for  the  Laboratory’s  work  in  the  1970’s  on 
knowledge-based  systems — for  example,  the  MACSYMA  program  for  symbolic  mathematics — 
natural  language  understanding,  and  (with  BBN)  the  development  and  use  of  packet  net¬ 
works.  During  this  same  period,  the  Laboratory  developed  theoretical  results  in  complexity 
theory  and  linked  cryptography  to  computer  science  through  concepts  and  algorithms  for 
public  encryption  (RSA).  In  the  late  1970’s,  Project  MAC,  renamed  as  the  Laboratory  for 
Computer  Science  (LCS),  embarked  on  research  in  clinical  decision  making,  the  exploration 
of  cellular  automata  at  the  borderline  between  physics  and  computation,  and  on  the  social 
impact  of  computers.  At  the  same  time,  it  began  two  major  research  programs  in  distributed 
systems  and  languages,  and  in  parallel  systems.  These  led  to  the  notion  of  data  abstractions 
and  the  Clu  language,  the  Argus  distributed  system,  the  dataflow  principle  and  associated 
languages  and  architectures  of  parallel  systems,  local  area  ring  networks,  program  specifi¬ 
cation,  and  workstation  development,  where  the  Laboratory  contributed  the  earliest  UNIX 
ports  and  compilers,  and  the  NuBus  architecture,  now  used  in  commercial  computers  like 
Apple’s  Macintosh  II.  This  research  has  also  led  to  the  X  Window  System,  a  computer 
intercommunication  standard,  developed  together  with  Project  Athena. 

The  Laboratory’s  current  research  falls  into  four  principal  categories:  Parallel  Systems; 
Systems,  Languages,  and  Networks;  Intelligent  Systems;  and  Theory.  The  principal  technical 
goals  ai  d  expected  consequences  in  each  of  these  four  categories  are  as  follows: 

In  Parallel  Systems ,  we  strive  to  harness  the  power  and  economy  of  numerous  processors 
working  on  the  same  task.  Research  in  the  area  involves  the  analysis  and  construction 
of  various  hardware  architectures  and  programming  languages  that  yield,  over  a  broad  set 
of  applications,  cost-performance  improvements  of  several  orders  of  magnitude  relative  to 
single  processors.  This  research  is  expected  to  affect  most  of  tomorrow’s  machines  which  we 
expect  to  be  of  the  multiprocessor  variety — not  only  because  of  potential  cost-performance 
benefits,  but  also  because  of  the  natural,  yet  unexploited,  concurrence  that  characterizes 
contemporary  and  prospective  applications  from  business  to  sensory  computing. 

In  Systems ,  Languages,  and  Networks ,  our  objective  is  to  provide  the  concepts,  methods, 
and  environments  that  will  enable  heterogeneous  computers,  each  working  on  different  tasks, 
to  communicate  efficiently,  conveniently,  and  reliably  with  each  other  in  order  to  exchange 
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information  needed  and  supplied  by  their  respective  programs.  Such  communication  may 
involve,  beyond  conventional  electronic  mail  and  file  transfer,  the  calling  of  programs  in 
one  environment  from  programs  in  another,  perhaps  different,  environment;  the  storage  and 
sharing  of  structured  data  among  such  programs;  and  the  use  of  an  information  infrastructure 
consisting  of  common  computer  and  communication  resources.  This  research  is  also  expected 
to  have  a  broad  impact  on  future  systems  because  virtually  every  machine  will  be  connected 
to  some  network. 

Taken  together,  these  two  thrusts  in  parallel  and  networked  machines  signal  our  expectation 
that  future  computer  systems  will  consist  of  multiprocessors  interconnected  by  local  and  long 
haul  networks,  and  perhaps  some  day  by  national  network  infrastructures  as  ubiquitous  and 
as  important  as  today’s  telephone  and  highway  infrastructures. 

In  the  Intelligent  Systems  area,  our  technical  goals  are  to  understand  and  construct  programs 
and  machines  tnat  have  greater  and  more  useful  sensory  and  cognitive  capabilities.  Examples 
include  the  understanding  of  spoken  messages,  systems  that  can  learn  from  practice  rather 
than  by  being  explicitly  programr. ed,  and  programs  that  reason  about  clinical  issues  and 
help  in  clinical  decision  making.  We  expect  tomorrow’s  intelligent  systems  to  be  easier  to 
use  than  today’s  programs  across  a  broad  front  of  applications. 

In  our  fourth  category  of  research,  Theory ,  we  strive  to  understand  and  discover  the  funda¬ 
mental  forces,  rules,  and  limits  of  computer  science  Theoretical  work  permeates  many  of 
our  research  efforts  in  the  other  three  areas,  for  example,  in  the  pursuit  of  parallel  algorithms 
and  in  the  study  of  fundamental  properties  of  idealized  parallel  architectures  and  computer 
networks.  Theory  also  touches  on  several  predominantly  abstract  areas,  like  the  logic  of 
programs,  the  inherent  complexity  of  computations,  and  the  use  of  cryptography  and  ran¬ 
domness  to  the  formal  characterization  of  knowledge.  The  impact  of  theoretical  computer 
science  upon  our  world  is  expected  to  continue  its  past  record  of  improving  our  understand¬ 
ing  of  and  helping  us  to  pursue  new  frontiers  with  new  models,  concepts,  methods,  and 
algorithms. 


1.2  Highlights  of  the  Year 


Research  highlights  during  the  reporting  period  were  as  follows: 


1.  Through  the  Laboratory’s  Spoken  Language  Systems  Group,  we  began  exploration  of 
an  international  interpretive  telephony  effort.  Users  of  this  telephone  would  speak  in 
their  native  tongue  using  a  limited  vocabulary  of  a  few  hundred  words  in  a  narrow 
domain  of  discourse,  as  in  for  example,  appointments,  visits,  and  travel  plans  that  lead 
to  meetings.  Each  sentence  would  be  translated  through  an  intermediate  language 
(I.L.)  to  the  language  of  the  other  party.  It  would  also  be  simultaneously  translated 
back  from  I.L.  to  the  original  language  to  ensure  the  system  “understood”  what  was 
said.  To  date,  we  have  secured  informal  partnerships  in  Europe  and  Japan  for  the 
purpose  of  carrying  out  this  research. 
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2.  Professor  David  Tennenhouse  and  his  associates  began  research  on  computer  worksta¬ 
tions  -hat  will  deal  with  video  images,  just  as  today’s  workstations  handle  text.  Novel 
processing-on-the-fly  methods  are  being  explored  in  this  area,  in  addition  to  the  more 
traditional  retrieve-proces.  -and-store  techniques.  Visual  images  are  likely  to  be  used 
increasingly  because  people  are  becoming  more  used  to  them  and  because  they  can  cut 
across  linguistic  barriers. 

3.  Professor  Stephen  Ward  completed  a  prototype  of  the  NuMesh — a  “Tinkertoy”  system 
that  enables  special  purpose  computers  to  be  built  out  of  general  purpose,  small  size 
blocks.  The  resultant  machines  are  expected  to  carry  out  special  purpose  processing 
at  very  high  speeds. 

The  Laboratory’s  Distinguished  Lecturer  Series  included  presentations  by  John  Kennessy,  # 

Bell  Professor  of  Electrical  Engineering  and  Computer  Science,  Stanford  University;  Terrence 
J.  Sejnowski,  Director,  Computational  Neurobiology  Laboratory,  Salk  Institute  and  Profes¬ 
sor  of  Biology  and  Physics,  University  of  California,  San  Diego;  Ronald  L.  Graham,  Adjunct 
Director,  Research,  Information  Sciences  Division,  AT&T  Bell  Laboratories;  Robert  M.  Met¬ 
calfe,  Ethernet  Inventor  and  3Com  Corporation  Founder;  and  Gordon  Plotkin,  Professor  of 
Computer  Science,  University  of  Edinburgh. 

Professors  Leo  Guibas  and  Mauricio  Karchmer  both  joined  the  Laboratory  as  members  of 
the  Theory  of  Computation  Group  and  Messrs.  Joseph  Polifroni  of  the  Spoken  Language 
Systems  Group  and  Kenneth  Streeter  of  the  Information  Mechanics  Group  became  members 
of  the  research  staff.  Changes  in  the  administrative  staff  included  the  departure  of  Mr. 

William  Fitzgerald,  who  was  replaced  as  Fiscal  Officer  by  Ms.  Azadeh  Djazani,  and  the 
assignment  of  Ms.  Carol  Robinson  to  Personnel  Officer. 

The  Laboratory  is  organized  into  15  research  groups,  an  administrative  unit,  and  a  computer 
service  support  unit.  The  Laboratory’s  membership  includes  a  total  of  400  people — 110 
faculty  and  research  staff,  40  visitors,  affiliates,  and  postdoctoral  associates,  35  support  staff, 

160  graduate  students,  and  55  undergraduate  students.  The  academic  affiliation  of  most  of 

the  faculty  and  students  is  with  the  Department  of  Electrical  Engineering  and  Computer  * 

Science  (EECS). 

The  Laboratory’s  funding  comes  predominantly  from  the  U.S.  Government’s  Defense  Ad¬ 
vanced  Research  Projects  Agency,  which  accounts  for  about  half  of  the  total.  In  addition, 
we  are  funded  by  and  have  extensive  links  with  industrial  organizations.  These  include  part¬ 
nerships  for  the  construction  of  major  hardware  systems,  consortia  for  the  development  and 
maintenance  of  standards,  like  the  X  Window  system,  and  joint  studies  on  research  areas 
of  common  concern.  Technical  results  of  our  research  in  1989-90  were  disseminated  through 
publications  in  the  technical  literature,  through  Technical  Reports,  numbered  454  through 
479,  and  Technical  Memoranda,  numbered  401  through  432. 
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2.1  Introduction 


The  goal  of  the  Advanced  Network  Architecture  project  continues  to  be  the  definition  of  a 
protocol  architecture  that  will  achieve  application  data  transfer  at  a  gigabit  or  more  while 
meeting  other  requirements  for  quality  of  service  and  media  independence.  The  central 
problem  of  our  group  has  been  the  management  of  bandwidth,  switching  capacity,  and 
buffer  resources  within  the  network.  If  we  are  to  achieve  higher  speeds  and  larger  size,  the 
tradeoffs  among  these  resources  must  change,  and  new  algorithms  and  approaches  will  be 
needed. 

In  the  following  sections,  a  number  of  specific  projects  related  to  this  overall  goal  are  de¬ 
scribed. 

r 

2.2  Rate-based  Flow  Control 

Previously,  we  proposed  rate-based  flow  control  as  a  key  concept  for  resource  management  in 
tomorrow’s  networks.  Zhang’s  thesis  [300]  proposed  a  specific  control  scheme  for  rate-based 
control,  including  the  resource  allocation  algorithm  in  the  gateway  and  a  matching  control 
scheme  in  the  host.  The  concept  of  a  virtual  clock  to  meter  the  traffic  in  the  various  flows  is 
central  to  the  scheme.  Extensive  simulation  indicates  that  the  scheme  has  great  promise. 

An  analysis  performed  by  Liang  Wu  (on  sabbatical  from  Bellcore)  related  the  Zhang  virtual 
clock  scheme  to  another  scheme,  the  so-called  leaky  bucket  scheme  being  proposed  for  resource 
management  in  telephone  networks. 

Helmut  Rebstock,  a  visiting  scientist  from  Siemens  Corporation,  and  James  Davin  simulated 
the  behavior  of  a  novel  rate  adjustment  algorithm  proposed  by  David  Tennenhouse.  By  this 
algorithm,  packets  are  always  generated  at  the  maximal  desired  rate,  but  only  a  certain 
portion  of  the  generated  packets  represent  useful  data:  the  rest  are  “empty”  packets.  The 
receiver  periodically  sends  control  information  to  the  transmitter  about  the  rate  at  which 
packets  are  actually  received,  and  the  transmitter  adjusts  the  relative  rate  of  empty  packet  * 

transmission  accordingly.  Simulation  showed  that  this  algorithm  successfully  copes  with 
congestion  in  simple  networks,  but  it  does  not  extend  well  to  more  complicated  topologies. 
Tennenhouse  subsequently  proposed  an  extension  of  the  empty  packet  scheme  by  which  a 
congested  packet  switch  discards  empty  packets  first.  This  augmented  scheme  has  not  been 
simulated  to  date. 


2.3  Protocol  Performance  Studies 


In  order  to  explore  the  contrast  between  rate-based  and  window-based  flow  control  mecha¬ 
nisms,  Rajiv  Jain,  a  visiting  scientist  from  the  ITT  in  India,  undertook  to  study  the  behavior 
of  TCP  in  the  presence  of  high  bandwidth,  long  delay  links.  Jain  simulated  the  slowstart 
TCP  algorithm  over  gigabit  links  with  transcontinental  delays.  Initial  results  suggested  that. 
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although  TCP  afforded  good  link  utilization  in  a  benign  environment,  it  degraded  signifi¬ 
cantly  in  the  presence  of  even  modest  packet  loss.  More  conclusive  results  were  not  pursued 
owing  to  the  significant  effort  required  for  each  experiment. 

Another  approach  to  studying  TCP  behavior  was  continued  by  Timothy  Shepard.  Previously, 
a  system  for  collecting  and  storing  about  12  hours  of  the  protocol  headers  of  all  the  packets 
on  one  of  the  main  Ethernets  in  the  Laboratory  had  been  built  and  is  now  in  continuous 
operation.  The  system  is  used  as  a  network  debugging  aid  and  as  a  source  of  traces  to 
support  research  in  the  analysis  of  TCP  packet  traces. 

Examination  of  a  trace  of  packets  collected  from  the  network  is  often  the  only  method  avail¬ 
able  for  diagnosing  protocol  performance  problems  in  computer  networks.  Shepard’s  thesis 
[269]  explores  the  use  of  packet  traces  to  diagnose  perform  e  problems  of  the  transport 
protocol  TCP.  Because  manual  examination  of  these  traces  Ca.n  be  so  tedious  as  to  preclude 
detailed  analysis,  a  more  effective  method  is  developed:  the  primary  contribution  of  this  the¬ 
sis  is  a  graphical  method  for  displaying  the  packet  trace  which  greatly  reduces  the  tedium 
of  its  analysis. 

This  graphical  method  is  demonstrated  by  the  examination  of  packet  traces  from  typical 
TCP  connections.  The  performance  of  two  different  implementations  of  TCP  sending  data 
across  a  particular  network  path  is  compared.  Traces,  many  thousands  of  packets  long, 
are  used  to  demonstrate  how  effectively  the  graphical  method  simplifies  examination  of 
long,  complicated  traces.  Because  the  burstiness  of  TCP  transmitters  observed  in  packet 
traces  seems  occasionally  related  to  their  achieved  throughput,  a  method  of  quantifying  this 
burstiness  is  presented  and  its  possible  relevance  to  understanding  the  performance  of  TCP 
is  discussed. 

To  facilitate  study  of  collected  packet  traces,  Shepard  developed  an  interactive,  X-based  tool 
that  displays  the  detailed  behavior  of  a  TCP  connection  according  to  the  graphical  method 
he  describes.  This  tool,  which  shows  the  timing  relationship  of  the  various  events  in  the 
protocol  transaction,  permit?  a  sophisticated  analyst  to  diagnose  and  debug  TCP  problems 
at  high  speed.  This  tool  has  been  used  with  great  success  on  local  MIT  networks  and  on  the 
ARPANET  to  examine  a  variety  of  performance  and  functional  problems. 


2.4  Fair  Queueing  in  Gateways 


James  Davin  and  Andrew  Heybey  continued  their  exploration  of  the  “Fair  Share”  queueing 
algorithm  developed  at  Xerox  PARC  [89].  A  series  of  simulations  demonstrated  that,  in  a 
connectionless  network,  a  gateway  which  allocates  outgoing  link  bandwidth  according  to  a 
fcir  queueing  algorithm  can  effectively  regulate  and  share  the  use  of  a  trunk,  as,  for  example, 
among  several  government  agencies  jointly  procuring  and  using  a  link.  In  its  simplest  form, 
the  algorithm  enforces  fairness — no  user  may  use  more  than  its  fair  share  of  the  output 
bandwidth.  It  was  shown  to  be  effective  in  enforcing  non-uniform  policies  by  which  some 
users  are  accorded  a  larger  share  of  the  bandwidth  than  others. 
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Three  fair  queueing  algorithms  were  evaluated  by  simulation:  the  algorithm  originally  de¬ 
scribed  in  [89],  a  known  (but  previously  unexplored)  variation  on  that  algorithm  that  is  more 
accommodating  to  bursty  sources,  and  a  novel  variation  on  the  algorithm  that  is  distinguished 
by  a  simplified  buffer  management  scheme.  These  three  algorithms  were  evaluated  for  their 
capacity  to  enforce  uniform  and  non-uniform  policies  in  the  face  of  network  demands  that 
ranged  from  light  to  heavy,  static  to  dynamic,  cooperative  to  non-conformant.  The  consid¬ 
ered  algorithms  afforded  effective  policy  enforcement  in  a  variety  of  circumstances  for  which 
traditional  first-come-first-served  gateway  policies  failed  to  do  so.  As  might  be  expected, 
none  of  the  considered  algorithms  were  able  to  correct  for  throughput  disparities  that  arise 
from  closed-loop  flow  control  mechanisms  in  networks  of  heterogeneous  delays.  A  paper 
describing  this  work  is  in  preparation. 

2.5  Random  Drop  Queue  Management 

An  algorithm  often  called  random  drop  has  been  proposed  by  various  workers  in  the  field  as 
a  simpler  alternative  to  fair  queueing  for  controlled  allocation  in  gateways.  Eman  Hashem 
completed  her  study  of  random  drop  and  other  congestion-related  phenomena  [141].  Based  on 
simulation  experiments,  she  concludes  that  random  drop  can  compensate  for  certain  peculiar 
meta-stable  conditions  leading  to  unfair  allocation  but  that  random  drop  cannot  compensate 
for  important  and  fundamental  causes  of  unfairness,  such  as  different  roundtrip  times  for 
connections  across  the  network.  Hashem  also  investigated  a  variant  of  the  random  drop 
scheme — early  random  drop — that  aspires  to  congestion  avoidance  rather  than  congestion 
recovery.  She  finds  that  the  success  of  early  random  drop  is  problematic  insofar  as  it  depends 
upon  the  development  of  effective  algorithms  for  dynamic  adjustment  of  the  drop  rate. 


2.6  Next  Generation  Protocol  Architecture 

An  overall  research  objective  of  the  group  is  to  synthesize,  out  of  specific  study  areas,  an 
overall  protocol  architecture  for  the  networks  of  tomorrow. 

To  this  end,  David  Tennenhouse  has  considered  the  relationship  between  popular  network 
design  strategies  and  the  performance  requirements  of  future  network  applications.  The 
ATM  (Asynchronous  Transfer  Mode)  approach  to  broadband  networking  is  presently  being 
pursued  within  the  CCITT  (and  elsewhere)  as  the  unifying  mechanism  for  the  support  of 
service  integration,  rate  adaptation,  and  jitter  control  within  the  lower  layers  of  the  network 
architecture.  Tennenhouse  prepared  a  paper  [277]  concerned  with  the  jitter  (variation  in 
delay)  arising  from  the  design  of  the  middle  and  upper  layers  that  operate  within  the  end 
systems  and  relays  of  multi-service  networks. 

In  order  to  augment  the  ongoing  discussion  of  ATM,  Tennenhouse  organized  the  ATM  Prac¬ 
titioner’s  Workshop,  held  January  22-24  at  the  MIT  Endicott  House  Conference  Center. 
This  conference  brought  together  over  30  individuals  with  a  wide  variety  of  perspectives  on 
ATM,  and  realized  an  unusual  opportunity  for  discussion  between  participants  in  the  B- 
ISDN  standards  committees  and  academic  researchers  in  the  field.  Although  the  context  of 
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the  discussion  was  current  ATM  standards  activity,  it  focused  on  reports  of  recent  research 
results  and  proposals  for  future  work. 

Exploration  of  a  broad  range  of  architectural  issues  was  realized  by  the  group’s  participation 
in  the  Internet  Research  Steering  Group  Workshop  on  Very  High  Speed  Networking,  held 
January  24-26  in  Cambridge,  MA.  David  Clark  served  on  the  program  committee  for  this 
conference  and  chaired  a  working  group  session  on  protocol  implementation.  David  Tennen- 
house  made  a  presentation  on  relevant  results  from  the  ATM  Practitioner’s  Workshop. 

The  group  was  able  to  contribute  to  the  evolution  of  national  networking  infrastructure  by 
participation  in  a  workshop  on  the  NRI  Networking  Testbed  held  in  December  in  Reston, 
Virginia. 


2.7  Research  Collaboration 


During  the  current  year,  the  group  established  a  number  of  key  research  collaborations  that 
will  augment  and  advance  its  research  agenda.  Among  the  most  important  is  a  collaboration 
with  Bellcore  by  which  protocol  concepts  developed  within  the  group  will  be  demonstrated 
using  a  prototype  ATM  switch  under  development  there.  As  a  part  of  this  effort,  members 
of  this  group  contributed  to  the  design  of  key  parts  of  that  switch — in  particular,  the  port 
controller  at  the  input  and  output  of  the  actual  switch  fabric. 

David  Tennenhouse  served  on  a  committee  to  discuss  possible  joint  ventures  between  re¬ 
searchers  on  the  MIT  campus  and  at  MIT  Lincoln  Laboratories  to  explore  all-optical  net¬ 
working  architectures. 


2.8  Broadband  ISDN  Standards  Activities 

The  current  efforts  within  ANSI  T1S1  to  develop  the  standards  for  Broadband  ISDN  will, 
if  successful,  define  the  nature  of  the  U.S.  telecommunications  infrastructure  for  the  next 
several  decades.  Because  of  the  importance  of  this  effort,  we  attempted  to  contribute  in 
ways  that  preserve  and  enhance  the  utility  of  the  network  for  computer  interconnect.  David 
Tennenhouse  attended  meetings  of  the  relevant  standards  committees. 

Alan  Buzacott  also  attended  standards  meetings  and  wrote  an  analysis  of  the  broadband 
standards  process  [65].  He  concludes  that,  although  the  emerging  standards  may  be  techni¬ 
cally  flawed  in  some  details,  the  broadband  ISDN  standards  process  succeeded  in  introducing 
fundamental  conceptual  changes  into  public  networks  in  a  timely  and  appropriate  way. 


2.9  Internet  Naming  Services 

During  the  year,  a  plan  for  development  and  deployment  of  a  “white-pages”  naming  service 
for  the  Internet  was  prepared.  International  standards  such  as  X.500  attempt  to  define 
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such  a  service  and,  indeed,  should  be  the  basis  of  an  Internet  service.  However,  significant 
additional  effort  will  be  required  to  realize  a  practical,  standards-based  system.  An  RFC  by 
Karen  Sollins  [273]  outlines  a  possible  program  in  this  area. 

As  part  of  this  activity,  Trevor  Mendez  wrote  an  X.500  Directory  User  Agent,  based  on  the  X 
Window  System,  that  works  with  the  Quipu  X.500  server  from  University  College,  London. 
To  further  simplify  the  user  interface  and  increase  functionality,  he  also  integrated  Kerberos 
authentication  into  Quipu. 


2.10  Network  Management 


James  Davin  continued  his  efforts  to  develop  the  Simple  Network  Monitoring  Protocol 
(SNMP),  by  participation  in  the  relevant  Internet  Engineering  Task  Force  working  groups, 
by  contributing  to  documentation  of  the  protocol  [68] [67] ,  and  through  trial  implementations 
and  document  contributions.  His  current  efforts  involve  the  addition  of  authentication  to 
SNMP  [115][86][222]. 

2.11  Advanced  Network  Simulator 

The  interactive  network  simulator  that  supports  much  of  the  simulation  work  in  the  group 
was  further  refined  and  enhanced  during  this  year.  New  simulator  components  were  crafted 
to  support  a  variety  of  experiments.  David  Martin  developed  a  facility  for  the  graphical 
display  of  relative  link  utilizations  on  links  shared  by  multiple  network  users.  Andrew  Heybey 
ported  the  simulator  to  MIPS  M/120  workstation  hardware.  James  Davin  and  Andrew 
Heybey  ported  the  simulator  to  the  Cray  2  supercomputer,  although  the  performance  of 
this  latter  port  is  less  than  might  otherwise  be  expected  owing  to  the  limited  opportunities 
for  vectorization  in  typical  network  simulations.  A  variety  of  bugs  have  been  fixed,  and  the 
improvements  have  been  released  to  other  interested  parties,  for  whom  at  least  a  minimal 
level  of  support  is  provided.  The  simulator  is  being  actively  used  by  people  at  Washington, 
Cray,  Purdue,  and  Mitre. 


2.12  Hardware  Development  Tools 


Jonathan  Coburn  developed  Xil,  a  digital  logic  description  language  embedded  in  Scheme. 
Xil  allows  digital  circuits  to  be  defined  in  terms  of  boolean  logic  functions  and  registers.  It 
outputs  configuration  information  for  Xilinx  reconfigurable  logic  arrays,  a  family  of  software- 
programmable  gate  arrays.  Xil’s  Scheme  embedding  allows  hardware  descriptions  to  be 
generated  algorithmically,  while  use  of  the  Xilinx  LCAs  allows  complex  designs  to  be  rapidly 
instantiated  as  single,  reusable  chips. 

Xil  is  described  in  [76].  It  is  currently  in  use  as  a  hardware  development  tool  in  support  of 
other  ANA  research  activities. 
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2.13  Internet  Protocol  Committee  Participation 


Because  of  the  continuing  importance  of  the  Internet  protocol  suite,  and  because  of  the 
potential  cross-fertilization  between  our  research  goals  and  the  future  needs  of  the  Internet, 
members  of  the  research  group  continue  to  participate  in  Internet  working  groups.  During 
the  year,  David  Clark  resigned  as  chairman  of  the  Internet  Activities  Board,  a  position  he 
had  held  since  1981.  He  continues  to  serve  on  the  IAB,  and  chairs  one  of  its  two  subcommit¬ 
tees,  the  Internet  Research  Steering  Group.  Members  of  the  group  have  attended  Internet 
Engineering  Task  Force  meetings,  as  well  as  IETF  working  groups  and  ad  hoc  committees 
as  appropriate,  including  various  meetings  to  discuss  naming  in  the  Internet.  Of  the  various 
activities  of  the  Internet  Research  Task  Force,  Clark  contributed  to  the  End-to-End  Re¬ 
search  Group,  and  both  Clark  and  Karen  Sollins  participated  in  the  Autonomous  Networks 
Research  Group. 
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Clinical  Decision  Making 

3.1  An  Artificial  Intelligence  Approach  to  Clinical  Decision  Mak¬ 
ing 

3.1.1  Background  and  Significance 

In  the  explosion  of  new  knowledge,  new  methods,  new  regulations,  stringent  pressures  to 
reduce  costs,  and  higher  expectations  from  patients  for  better  outcomes,  medicine  faces  a 
major  problem  of  information  management  and  utilization.  During  the  past  decade,  many 
independent  studies  of  this  complex  of  problems  have  settled  on  medical  informatics  as  the 
field  promising  to  help  alleviate  this  problem.  From  the  GPEP  report  of  the  early  1980’s, 
to  the  NLM’s  planning  reports  of  the  mid-1980’s,  to  this  year’s  recommendation  at  the 
Harvard  Medical  School  to  establish  a  center  for  medical  informatics  and  to  assure  that  all 
medical  students  are  trained  in  this  discipline,  the  need  for  more  sophisticated  computer- 
based  applications  in  medicine  is  clearly  identified.  These  are  to  be  applications  that  in  some 
sense  “understand”  the  content  of  the  information  they  manipulate. 

The  idea  that  one  can  develop  computer  programs  that  assist  in  making  diagnostic  ;.nd 
therapeutic  decisions,  or  that  track  the  ongoing  state  of  a  patient  and  comment  on  the 
appropriateness  of  therapy  is  hardly  "ovel,  of  course.  Flowchart  and  statistical  classification 
models  dating  back  to  the  early  1960’s  have  played  a  small  beneficial  role  in  enhancing 
medical  care,  and  systems  of  almost  equal  vintage  that  rely  on  a  fairly  complete  online 
medical  record  have  provided  trend  analysis  and  simple  “sanity  checks”  for  evolving  patient 
cases.  Systems  based  on  online  medical  records  have  limited  their  reasoning  to  issues  that 
could  be  adequately  addressed  by  data  that  were  typically  available,  which  in  most  cases  fails 
to  capture  much  of  what  is  clinically  relevant.  The  scarcity  of  adequate  background  data,  and 
the  lack  of  modularity  and  internal  organizational  structure  in  the  classification  models,  has 
prevented  their  construction  for  large  medical  domains  and  has  seriously  impaired  the  ability 
of  developers  to  maintain  them  [264].  In  response  to  these  inadequacies,  researchers  turned 
in  the  early  1970’s  to  artificial  intelligence  methods,  to  provide  tools  for  building  programs 
with  more  “understanding.”  By  adopting  a  consultation  model,  where  the  program  is  to  be 
able  to  ask  questions  of  its  users,  the  inadequacy  of  computer-accessible  information  could 
be  overcome  (though  of  course  at  a  high  cost  in  time  demanded  of  the  user).  lV-hapr  rn-  re 
fundamentally,  these  programs  pursued  the  hypothesis  that  one  could  overcome  the  lack  of 
statistical  data  by  substituting  for  it  the  codified  expertise  of  human  expert  clinicians.  This 
is  not  to  say  that  one  would  simply  ask  people  to  guess  statistical  correlations  instead  of 
gathering  data  on  them.  Instead,  the  idea  was  to  discover  the  reasoning  and  problem-solving 
strategies  used  by  human  experts,  to  de-brief  them  of  the  knowledge  they  use,  in  the  form 
in  which  they  use  it,  and  then  to  build  computer  programs  that  operate  according  to  the 
same  principles  and  with  the  same  knowledge. 


3.2  State  of  the  Art  in  Medical  AI 


Since  the  early  1 970’s,  the  field  of  medical  AI  has  provided  a  number  of  impressive  demon¬ 
strations  of  programs  able  to  capture  human-like  expertise  and  to  apply  it  in  human  like 
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ways  that  seem  acceptable  to  their  users.  Systems  such  as  the  Present  Illness  Program  [244] 
and  the  Digitalis  Therapy  Advisor  [131]  from  our  group,  and  MYCIN,  INTERNIST- 1  [225],  and 
CASNET/Claucoma  [294],  provided  early  indications  that  such  AI  programs  could  overcome 
previous  methodological  limitations  to  provide  human-like  expertise  in  a  computer.  Indeed, 
tests  of  these  and  successor  programs  have  several  times  confirmed  that,  within  a  typically 
narrow  set  of  circumstances,  the  performance  of  the  program  was  (nearly)  indistinguishable 
from  that  of  expert  physicians  [298]  [225]  [146]  [34j . 

Sadly,  despite  these  documented  successes,  the  practical  utility  of  programs  of  this  sort  in 
medicine  remains  very  limited — essentially  only  a  few  such  programs  (PUFF,  ONCOCIN  and 
a  serum  electrophoresis  interpretation  program  built  into  an  instrument  by  Helena  Labora¬ 
tories)  receive  any  routine  use  [75],  and  then  typically  at  medical  centers  closely  associated 
with  their  developers.  Interestingly,  the  techniques  developed  for  some  of  these  medical  pro¬ 
grams  have  been  generalized  and  exported  to  commercial  and  industrial  areas  of  application, 
where  they  have  formed  the  basis  for  a  significant  “expert  systems”  revolution  [107]. 

Why  have  medical  AI  programs  not  succeeded  practically,  when  they  appear  to  have  suc¬ 
ceeded  in  the  laboratory?  One  hypothesis  is  that  the  fundamental  technology  of  these 
systems  is  fine,  but  that  much  more  engineering  effort  is  needed  to  bring  them  to  successful 
use.  Another,  which  motivates  us  here,  is  that  there  really  are  fundamental  deficiencies  in 
the  techniques  on  which  these  systems  are  based — deficiencies  that  prevent  their  functioning 
as  well  as  is  necessary  for  widespread  adoption. 

No  doubt  better  engineering  will  be  needed  for  the  success  of  medical  expert  systems.  Thus, 
work  is  needed  on  comprehensive  medical  record  systems  that  contain  a  timely,  complete  and 
accurate  view  of  the  patient  and  the  care  he  or  she  is  getting.  Outstanding  user  interfaces, 
which  exploit  the  power  of  graphical  output  and  voice  input  would  also  be  a  boon.  Improved 
ancillary  services  to  train  potential  users  and  more  thoroughly  integrate  computer  systems 
into  the  fabric  of  health  care  are  also  likely  to  be  needed.  In  addition,  work  on  standards  for 
medical  terminology,  large  scale  knowledge  bases  organized  for  teaching  and  reference,  and 
integration  of  patient  care,  research,  library,  image,  and  electrical  signal  databases  into  a 
uniformly-accessible  information  system  is  an  important  goal.  Nevertheless,  we  subscribe  to 
the  second  of  the  above  hypotheses,  that  even  significant  advances  toward  all  these  laudable 
goals  would  leave  fundamental  gaps  in  our  ability  to  build  truly  usable  systems. 


3.3  Sources  of  Difficulty 

Among  the  major  difficulties  identified  in  building  medical  reasoning  systems  are  the  han¬ 
dling  of  multiple  interacting  diseases,  interactions  between  diseases  and  incomplete  or  only 
partially-effective  therapies,  the  need  to  take  time  into  account  in  both  diagnostic  and  ther¬ 
apeutic  reasoning,  and  real  time  constraints  on  decision  making.  Each  of  these  has  severely 
stressed  the  basic  mechanisms  of  even  the  most  successful  demonstrated  programs,  and  each 
suggests  that  there  is  much  additional  need  for  research. 

The  need  to  deal  with  unanticipated  interactions  has  meant  that  programs  have  turned  from 
relatively  simple  means  of  associating  clusters  of  abnormal  findings  with  disease  hypotheses 
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to  much  more  complex  means  of  assembling  hypotheses  that  represent  multiple  co-occurring 
disorders.  Often,  this  has  required  new  causal  and  probabilistic  models  of  the  interactions 
among  disorders  and  the  ways  in  which  disorders  manifest  as  abnormal  findings.  Thus,  the 
knowledge  base  of  recent  medical  AI  programs  is  typically  quite  complex;  their  reasoning 
methods  are  multi-faceted  and  involved,  their  conclusions  are  difficult  to  explain  because  of 
this  complexity,  and  the  programs  are  hard  to  build,  debug  and  maintain. 

Five  years  ago,  we  suggested  one  approach  to  alleviating  some  of  these  problems,  based  on 
the  adoption  of  a  common  underlying  knowledge  representation  mechanism.  Unfortunately, 
though  we  have  shown  some  progress  in  this  direction,  the  existing  knowledge  representation 
formalisms  at  our  disposal  have  not  been  up  to  the  task — the  complexity  and  breadth  of  types 
of  knowledge  that  need  to  be  represented  in  medical  reasoning  overwhelms  the  abilities  of 
existing  techniques. 

In  addition,  as  the  medical  AI  field  has  matured  and  the  ambitions  of  specific  projects 
have  increased,  it  appears  to  take  longer  and  longer  for  an  interesting  new  idea  to  move 
from  conception  to  demonstration  in  an  effective  program.  The  need  to  hand-tailor  medical 
knowledge  bases  specific  to  a  particular  project,  as  well  as  to  develop  a  complex  set  of 
technical  capabilities,  has  meant  that  often  five  years  may  elapse  from  inception  of  an  idea 
to  its  initial  demonstration.  This  observation  naturally  leads  to  the  suggestion  that  perhaps 
computer  learning  techniques  could  serve  to  allow  the  machine  to  be  a  more  active  partner  in 
building  new  programs.  Until  recently,  we  felt  that  many  of  the  learning  methods  explored 
in  the  AI  literature  were  not  likely  to  be  directly  applicable  to  our  problems.  Such  methods 
fell  into  two  camps:  methods  based  on  statistical  learning  gave  no  place  to  hard-earned 
knowledge  that  we  already  possessed,  and  it  seemed  implausible  to  ask  automated  learning 
techniques  to  rediscover  all  the  existing  knowledge  of  medicine.  Conversely,  most  symbolic 
learning  methods  assumed  a  deterministic  underlying  domain,  in  which  noise  or  stochastic 
behavior  would  lead  either  to  no  learning  or  to  internal  contradictions. 


3.4  Artificial  Intelligence  and  Cardiovascular  Reasoning 

3.4.1  Diagnosis 

The  heart -failure  diagnosis  program  provides  two  types  of  diagnostic  information:  it.  deter¬ 
mines  the  probability  of  parameter  states  in  the  physiologic  model,  and  it  generates  differ¬ 
ential  diagnoses  each  of  which  fully  explains  the  set  of  findings. 

The  model  for  diagnosis  consists  of  nodes  copied  from  the  parameter  states  with  binary  values 
and  measurement  values  (the  findings)  connected  by  links  with  probabilities  determined 
from  the  knowledge  base  and  patient  input.  The  probabilities  are  combined  using  a  “noisy- 
or”  combination  rule  [245]  e:  pt  for  worsening  factors,  which  require  another  cause,  and 
correcting  factors,  which  decrease  the  probability.  Thus,  if  causes  are  P,  worsening  factors 
W.  correcting  factors  C,  and  primary  probability  is  p0,  the  probability  of  a  node  is: 

if1/  •  P  A  (p,  '  1.0)  -V  1.0,  d f | f  €  p  (1  n  U  "  ?•))  no  ~  P.)»e*S('  ^  Vo 
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Similarly,  each  measurement  value  has  a  probability  of  being  produced  by  nodes.  The  model 
is  similar  to  those  investigated  by  Pearl  [245]  as  Bayesian  probability  networks.  The  difference 
is  that  this  model  has  forward  loops  (excluded  by  Pearl)  and  nodes  with  multiple  paths 
between  them  (handled  only  in  exponential  time  by  Pearl’s  methods).  We  investigated 
modifications  to  Pearl’s  algorithm  and  to  our  model.  However,  eliminating  the  forward 
loops  in  an  earlier  model  version  there  were  still  about  40  links  that  would  have  to  be  cut 
to  analyze  multiple  paths  between  nodes.  Thus,  Pearl’s  algorithm  would  require  weighted 
summing  of  about  240  solutions,  which  is  completely  infeasible.  Indeed,  Cooper  has  shown 
that  the  problem  is  NP-hard  [79].  Thus  heuristic  methods  are  necessary  to  handle  large 
networks. 

After  much  investigation  we  developed  a  mechanism  for  estimating  the  probability  of  a  node 
given  evidence.  It  is  based  on  the  causal  paths  from  primary  nodes  to  the  node  in  question. 
Since  only  about  50  of  the  150  nodes  in  the  model  are  primary  (having  a  non-zero  probability 
of  existing  without  some  other  cause),  it  generates  and  stores  all  of  the  paths  from  these  nodes 
to  all  others  and  computes  the  probabilities  along  those  paths.  The  paths  are  generated  when 
the  model  is  first  loaded  and  the  probabilities  are  computed  when  the  patient  data  is  entered. 
A  conservative  approximation  of  the  causal  probability  of  any  node  is  the  combination  of 
the  highest  probability  paths  from  each  of  the  primary  nodes  to  that  node,  assuming  the 
independence  of  the  causes  and  the  default  combination  rule.  Explicit  causal  combinations 
(  e.g.,  the  worsening  factors)  are  handled  by  revising  the  probabilities  along  the  paths  for 
the  estimated  probabilities  of  these  additional  factors.  This  mechanism  has  proven  to  be  an 
effective  way  of  estimating  the  causal  probabilities  of  nodes.  To  determine  the  probability  of 
a  node  we  treat  the  evidence  evaluation  problem  as  locally  computing  the  probability  of  the 
observed  effects  given  each  combination  of  causes.  This  mechanism  estimates  the  probability 
of  any  parameter  state  given  whatever  other  states  or  evidence  is  already  known. 

Our  solution  to  the  differential  diagnosis  problem  is  to  generate  complete  hypotheses  (causal 
paths  from  primary  causes)  for  the  findings  and  present  the  user  with  a  list  of  hypotheses 
and  their  relative  total  probabilities  for  comparison.  In  comparing  hypotheses,  we  discovered 
that  the  natural  notion  of  different  hypotheses  requires  that  they  differ  in  some  significant 
node,  nodes  which  we  Inve  labeled  diagnostic.  The  algorithm  is  as  follows:  1)  check  the 
input  for  definite  implications,  findings  that  require  nodes  to  be  true  or  false;  2)  collect 
the  abnormal  findings  from  the  input;  3)  find  all  of  the  diagnostic  or  primary  nodes  that 
could  account  for  each  finding;  4)  rank  the  diagnostic  and  primary  nodes  by  the  number  of 
findings  they  account  for;  5)  use  the  better  of  these  as  seeds  for  finding  small  covering  sets 
of  primary  nodes;  6)  for  each  covering  set,  order  the  findings  by  the  difference  between  the 
first  and  second  highest  probability  path  to  it;  7)  for  each  finding,  the  best  path  from  the 
partial  hypothesis  is  found  and  added  to  it;  and  8)  the  hypothesis  is  pruned  of  unneeded 
primary  nodes  and  extra  paths  that  decrease  the  probability.  Finally,  the  probabilities  of  the 
hypotheses  are  computed  by  multiplying  the  probabilities  of  the  nodes  given  the  other  nodes 
in  the  hypothesis  and  they  are  rank  ordered  and  presented  to  the  user.  These  probabilities 
could  be  normalized  by  the  probability  of  the  findings  but  that  is  unnecessary  as  long  as  we 
are  only  rank  ordering  hypotheses.  The  algorithm  is  discussed  in  detail  in  a  paper  [204]. 

This  approach  to  diagnosis  differs  considerably  from  others  that  have  appeared  in  the  litera- 
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_ Figure  3.1:  Congestive  Cardiomyopathy  and  Renal  Insufficiency  Hypothesis _ 

ture.  Reggia’s  minimal  set  covering  approach  [252]  ignores  the  fact  that  the  best  hypothesis 
may  not  be  minimal  and  would  not  find  the  hypothesis  in  Figure  3.1.  Other  approaches  to 
diagnosis  based  on  digital  circuit  analysis  [253] [88]  assume  that  every  node  is  primary  and 
every  node  can  be  measured.  If  every  node  were  treated  that  way,  a  network  of  this  size 
would  be  computationally  intractable. 

Our  mechanism  is  effective  for  producing  a  meaningful  set  of  hypotheses  for  the  findings 
and  it  usually  takes  less  than  a  minute  on  a  Symbolics  3650  workstation.  The  user  can 
compare  the  hypotheses,  see  explanations,  and  consider  the  differences.  Figure  3.1  is  the 
display  of  the  first  of  five  hypotheses  generated  for  an  actual  patient  with  findings  that 
included  rales,  pedal  edema,  high  BUN,  nausea,  S3,  and  runs  of  VT.  The  display  graphically 
presents  the  complete  explanation  for  the  findings  and  provides  a  textual  summary  of  the 
case  at  the  bottom  of  the  screen.  In  the  display,  the  findings  are  in  lower  case,  intermediate 
nodes  in  upper  case,  primary  nodes  in  bold  face,  primary  probabilities  in  parentheses,  causal 
probabilities  on  links  and  W+  indicating  worsening  factors  that,  increase  the  probability 
and  P-  indicating  correcting  factors  that  decrease  it.  This  hypothesis  accounts  for  the 
findings  with  congestive  cardiomyopathy  and  renal  insufficiency,  while  the  second  hypothesis 
accounts  for  the  findings  with  congestive  cardiomyopathy  alone.  Those  hypotheses  nicely 
capture  the  physician’s  initial  dilemma:  whether  the  high  BUN  was  renal  or  prerenal.  Other 
hypotheses  included  valve  disease,  which  is  an  important  consideration.  This  hypothesis 
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illustrates  several  features  of  the  algorithm:  1)  it  handles  multiple  causes;  2)  it  handles 
multiple  pathways  between  nodes;  3)  findings  can  be  left  unexplained  (the  murmur);  and 
4)  iatrogenic  causes  (digitalis  toxicity  here)  are  handled.  This  kind  of  explanation  is  a  rich 
source  of  information,  proposing  mechanisms,  showing  assumptions,  showing  where  therapies 
might  be  beneficial,  and  providing  enough  information  for  the  user  to  judge  whether  the 
hypothesis  is  really  justified.  (This  example  is  discussed  in  more  detail  in  [205].) 

This  method  of  generating  hypotheses  is  heuristic  and  indeed  it  is  possible  to  construct 
networks  where  it  does  not  find  the  best  answer.  (Notice  that  only  the  search  is  heuristic, 
not  the  use  of  probabilities.)  However,  we  have  tested  over  60  actual  cases  thus  far  as  well 
as  many  created  cases  and  have  found  the  algorithm  to  be  effective.  On  one  set  of  42  cases, 
collected  while  developing  the  algorithm,  the  performance  was  tabulated.  In  31  of  these 
the  program  produced  reasonable  hypotheses.  In  five  the  hypotheses  were  almost  right  but 
parts  of  the  mechanisms  were  inappropriate.  In  the  other  six  cases  the  best  hypothesis  was 
missed.  There  were  two  main  reasons  for  these  problems:  1)  the  program  did  not  reason 
appropriately  with  the  temporal  relationships  between  cause  and  effect,  and  2)  it  did  not 
handle  severity  relations  appropriately. 

3.4.2  Summary 

We  started  with  a  vision  to  build  a  qualitative  physiologic  model  and  develop  strategies  for 
diagnostic  and  therapeutic  reasoning  using  the  logical  relationships  between  the  physiologic 
entities  and  input  values.  From  the  experience  gained  from  this  system,  we  recognized 
the  need  for  a  probabilistic  approach  to  diagnosis  and  a  quantitative  approach  to  therapy 
prediction.  To  fulfill  these  needs,  we  created  a  practical  method  for  heuristically  finding  the 
best  explanations  for  a  set  of  findings  in  a  large  causal  probability  network  and  created  a 
new  method  for  predicting  changes  in  a  network  of  constraint  equations  based  on  signal  flow 
analysis.  In  addition,  we  developed  a  method  for  using  the  causal  model  to  guide  case  based 
reasoning,  a  statistical  method  for  predicting  behavior,  a  control  strategy  for  time  dependent 
data,  and  a  method  for  attributing  causes  to  effects  over  time.  These  methods  give  us  the 
basic  tools  needed  to  develop  an  effective  program  for  assisting  physicians  in  reasoning  about 
complex  cases  over  single  or  multiple  sessions. 

3.4.3  Knowledge  Representation  and  Default  Reasoning 

Jon  Doyle  continued  his  investigation  of  artificial  intelligence  using  theories  and  techniques 
from  economics  and  decision  theory,  working  in  conjunction  with  Michael  Wellman  (USAF) 
and  Ramesh  Patil.  These  investigations  yielded  numerous  papers:  Wellman  and  he  improved 
their  treatments  of  default  reasoning  [99],  which  it  now  appears  will  be  published  in  a  special 
issue  of  Artificial  Intelligence  on  the  best  papers  from  the  KR’89  Conference  in  Toronto.  His 
paper  with  Patil  on  knowledge  representation  languages  [97]  will  be  published  in  Artificial 
Intelligence  with  a  response  article  by  Ron  Brachman  and  Hector  Levesque.  An  expansion  of 
his  paper  with  Elisha  Sacks  (Princeton)  on  probabilistic  qualitative  reasoning  was  presented 
at  IJCAI  [98],  and  has  been  submitted  to  Computational  Intelligence.  A  paper  on  the 
philosophical  foundations  of  AI  will  be  appearing  in  a  MIT  Press  collection  on  Philosophy 
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and  AI  [92].  He  will  be  presenting  a  paper  on  rational  belief  revision  at  the  third  Workshop 
on  Nonmonotonic  Reasoning  in  June  [93].  Finally,  he  gave  two  invited  talks  this  year,  and 
will  be  giving  two  more  in  July.  His  invited  talk  at  the  Conference  on  the  Dynamics  of 
Belief  (Sweden)  will  be  appearing  in  a  Springer  volume  on  the  Logic  of  Theory  Change  [94]. 
His  invited  talk  at  ISMIS’89  appeared  in  a  proceedings  volume  from  North-Holland  [91]. 
He  will  be  speaking  on  rational  self-government  at  the  Second  International  Conference  on 
Economics  and  AI  (Paris)  in  July  [95],  and  giving  an  invited  address  on  the  roles  of  rationality 
in  AI  at  AAAI’90  in  August  [96]. 

3.5  Student  Progress 

3.5.1  Cardiac  Arrhythmia  Classification  (Scott  Greenwald) 

Automated  cardiac  arrhythmia  detectors  perform  well  at  detecting  and  classifying  beats  in 
clean  data,  even  when  compared  with  humans.  However,  the  performance  of  automated 
systems  degrades  dramatically  in  the  presence  of  electrode  motion  noise  because  of  falsely 
detected  QRS-like  artifacts  and  misclassified  noise  distorted  beats.  Current  systems  detect 
and  classify  beats  using  a  very  limited  scope  of  the  available  information  surrounding  can¬ 
didate  beats.  In  contrast,  human  experts  perform  well  in  noise  corrupted  data  because  they 
use  prior  knowledge  of  general  principles  of  electrocardiology  and  information  gleaned  from 
the  patient’s  clean  ECG,  and  because  they  make  use  of  a  wide  context  of  ECG  surrounding 
candidate  beats.  This  thesis  has  explored  the  hypothesis  that  the  use  of  wide  contextual  in¬ 
formation  within  the  electrocardiogram  and  prior  knowledge  of  the  signal  source  can  improve 
beat  detection  and  classification,  and  enhance  artifact  rejection  in  the  field  of  automated  ar¬ 
rhythmia  analysis.  We  tested  this  hypothesis  by  developing  an  expert  system  (HOBBES) 
that  emulates  techniques  used  by  human  experts  in  processing  ECGs.  HOBBES  was  con¬ 
structed  as  a  post-processor  to  a  classical  arrhythmia  detector  (ARISTOTLE).  The  output 
of  ARISTOTLE  is  an  annotation  stream  which  contains  an  entry  for  each  detected  event. 
Each  entry  contains  ARlSTOTLE’s  suggested  beat  label,  the  detection  time,  an  estimate  of 
the  noise  within  the  detected  event,  and  a  composite  QRS  morphology  measure.  HOBBES 
analyzes  ARlSTOTLE’s  annotation  stream  in  three  passes  and  creates  a  final,  error  reduced 
annotation  stream  as  its  output.  On  the  first  pass,  HOBBES  learns  contextual  information 
by  developing  a  nine-dimensional  feature  space  description  of  patterns  of  five-beat  sequences 
seen  in  clean  data.  Each  five-beat  sequence  is  represented  by  the  five  QRS  morphology  fea¬ 
tures  and  the  four  (scaled)  interbeat  intervals  within  the  five-beat  sequence.  On  the  second 
pass,  HOBBES  finds  clearly  identifiable  “landmark”  beats  (based  on  morphology  or  beat 
arrival  times)  within  the  noisy  data.  On  the  final  pass,  HOBBES  compares  sequential  over¬ 
lapping  sequences  of  the  noisy  annotation  stream  to  hypothetical  five-beat  sequences  based 
upon  patterns  learned  in  clean  data.  The  best-fitting  hypotheses  are  selected,  and  final  beat 
labels  are  assigned  to  each  event.  The  performance  of  human  experts,  ARISTOTLE,  and 
HOBBES  was  evaluated  on  a  database  of  half-hour  ECG  records.  Ten  percent  of  each  record 
was  selectively  corrupted  by  adding  three  minutes  of  electrode  motion  noise  from  an  inde. 
pendent  noise  database.  The  results  indicate  that  although  the  performance  of  HOBBES 
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is  worse  than  human  experts  for  many  of  the  records,  HOBBES  has  significantly  enhanced 
ARISTOTLE’s  performance  in  processing  noisy  ECGs. 

3.5.2  Repeated  Sequences  and  Parameter  Uncertainty  in  Steady-state  Systems 
(Alex  Yeh) 

Mr.  Yeh  has  been  designing  and  implementing  a  program  called  AIS  (short  for  Analyzer  of 
Iterated  Sequences)  that,  when  given  a  state-description  of  a  system  and  a  sequence  of  actions 
or  transformations  on  that  state,  symbolically  finds  some  of  the  extreme  and  time-averaged 
effects  of  continually  iterating  that  sequence.  The  specific  effects  found  at  present  include  1) 
the  extreme  values  of  parameters  that  vary  periodically  with  each  iteration,  2)  the  symbolic 
average  rate  of  change  in  parameters,  and  3)  an  assessment  of  how  those  rates  of  change  would 
be  different  with  different  values  for  various  constants  and  functions  (sensitivity  analysis). 
The  sequences  handled  by  AIS  are  ones  which  have  the  following  “constancy”:  the  sequence 
always  repeats  the  same  actions  in  the  same  order  and  each  occurrence  of  a  given  action 
always  changes  the  parameters  by  the  same  amounts.  Examples  of  such  iterated  sequences 
of  actions  include  the  ones  taken  by  a  heart  in  going  through  a  beat  cycle  at  steady-state  and 
the  actions  taken  by  a  steam  engine  in  making  one  rotation  of  its  drive  shaft  at  steady-state. 
Effects  to  be  found  include  the  extreme  pressures  in  an  engine,  the  average  rate  at  which 
blood  enters  the  heart,  and  how  increasing  that  entering  blood’s  pressure  affects  that  rate. 

One  motivation  for  finding  such  effects  is  to  find  what  stresses  a  device  needs  to  tolerate, 
such  as  the  maximum  pressure  an  engine  or  heart  is  subject  to.  A  second  motivation  is 
that  many  periodic  subsystems  iterate  at  such  a  fast  rate  that  the  other  parts  of  a  system 
respond  only  to  the  behavior  of  such  a  subsystem  (3  averaged  over  many  iterations.  Then 
a  steady-state  model  for  the  entire  system  would  only  require  a  description  of  /?’ s  averaged 
behavior;  /3  can  be  modeled  as  a  constant  iteration  of  the  same  sequence  of  parameter  value 
changes.  Examples  of  such  subsystem  and  system  combinations  include  1)  the  heart  and  the 
human  circulatory  system,  and  2)  an  engine  and  a  car. 

Yeh  worked  on  three  examples  of  using  AIS.  The  first  concerns  a  normal  ventricle  (part  of 
the  heart).  AIS’s  results  are  similar  to  results  either  derived  by  others  by  hand  or  determined 
empirically  from  experiments.  The  second  example  is  on  a  ventricle  with  a  disease  called 
mitral  stenosis.  The  model  in  this  example  is  larger  and  more  ambiguous  (“qualitative”) 
than  in  the  first  example.  The  example  indicates  that  AIS  can  handle  fairly  ambiguous 
models,  but  that  the  results  will  reflect  that  ambiguity.  The  third  example  changes  domains 
and  is  on  a  steam  engine.  This  example  is  like  the  second  in  that  it  is  larger  than  the  first. 
But  unlike  the  second  one,  the  steam  engine  model  is  a  lot  more  precise  on  the  forms  of 
the  functions  involved,  and  AIS’s  output  reflects  this.  He  incorporated  AIS’s  steam  engine 
results  into  a  simple  steady-state  model  of  a  train. 

3.5.3  Multi-criteria  Operator  Selection  (Dennis  Fogg) 

Automated  synthesis  of  VLSI  architectures  requires  selecting  operators  to  implement  arith¬ 
metic  operations.  Two  solutions  to  this  task  are  presented.  The  first  solution  extends 
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current  AI  techniques  in  parametric  design  by  considering  multiple  performance  criteria. 
The  fundamental  obstacle  in  multi-criteria  design  is  incorporating  user  preferences  about 
tradeoffs  between  performance  criteria.  We  present  a  new  framework,  called  Heuristically 
Guided  Enumeration  (HGE)  that  uses  expert  knowledge  to  guide  enumeration  toward  opti¬ 
mal  designs,  and  uses  heuristics  to  control  enumeration  so  better  designs  are  created.  HGE 
encourages  the  user  to  explore  the  frontier  of  optimal  designs  by  generating  fast  approxima¬ 
tions  to  the  frontier.  The  user  controls  the  enumeration  by  defining  particular  regions  of  the 
design  space  to  explore  and  limiting  the  cardinality  of  the  designs  created.  An  implemented 
system  enumerates  one  millionth  of  the  design  space  yet  produces  near  optimal  results.  The 
second  solution  is  based  on  mathematical  optimization  methods.  We  formulate  the  opera¬ 
tor  selection  task  as  a  linear  programming  (LP)  problem.  The  LP  solution  defines  a  lower 
bound  on  the  optimal  design,  and  subsequent  processing  to  discretize  the  solution  produces 
an  upper  bound.  The  LP  approach  is  combined  with  Leiserson  and  Saxe’s  retiming  method 
to  simultaneously  select  operators  and  insert  pipeline  registers. 

3.5.4  Case-based  Reasoning  (David  Aghassi) 

In  routine  problem  solving,  people  reason  from  experience,  remembering  their  solutions  to 
recurrent  problems  rather  than  reconstructing  them  from  scratch  each  time.  The  method  of 
case-based  reasoning  attempts  to  exploit  this  intuitive  strategy  on  a  computer  by  maintaining 
a  memory  of  precedents,  and  by  solving  a  new  case  according  to  the  solution  of  the  most 
suitable  precursor.  Diverse  applications  of  the  method  seem  to  suggest  its  viability,  but 
a  widespread  lack  of  thorough  evaluation  questions  this  support.  Indeed,  while  previous 
work  implies  that  case-based  rersoning  is  successful  for  a  variety  of  domains,  few  papers 
identify  the  general  relationships  between  performance  and  the  domain  characteristics  and 
scaling  factors  that  underlie  it.  Thus,  researchers  are  left  without  an  understanding  of  the 
method’s  scope  or  scale,  and  intuitions  about  human  experience  continue  to  be  its  primary 
justification. 

David  Aghassi’s  work  addresses  many  of  these  open  concerns  in  the  context  of  heart  failure 
diagnosis,  evaluating  the  existing  case-based  reasoner  CASEY  with  respect  to  a  pool  of  240 
patients.  To  investigate  the  method’s  scale,  he  measured  the  effects  of  increasing  experience 
on  both  accuracy  and  efficiency.  He  also  analyzed  the  distribution  of  cases  in  order  to  quantify 
its  intrinsic  regularity,  thus  exposing  the  dependence  of  the  system’s  utility  on  the  domain 
and  facilitating  an  extrapolation  of  this  utility  to  other,  similarly  characterized  applications. 
First,  he  gauged  the  recurrence  of  similar  cases  in  varying  size  collections  of  patients;  second, 
he  measured  the  correlation  between  symptomatic  similarity  and  diagnostic  similarity;  and 
finally,  he  appraised  the  absolute  diagnostic  homogeneity  of  the  case  pool. 

Because  cardiologists  claim  that  most  cases  are  variations  on  recurring,  well  understood 
pathophysiologic  themes,  he  expected  to  justify  the  application  and  verify  the  presumed 
regularity  upon  which  its  success  depends.  Instead,  he  discovered  that  CASEY’s  accuracy 
does  not  increase  with  experience,  while  its  efficiency  degrades  with  the  number  of  avail¬ 
able  precedents.  Fundamentally,  similar  cases  and  similar  diagnoses  were  rare  among  the 
240  patients,  and  moreover,  symptomatic  resemblances  did  not  guarantee  diagnostic  cor¬ 
respondence.  Because  of  the  varying  combination  and  interaction  of  multiple  diseases,  the 
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patients  were  largely  heterogeneous,  suggesting  that  the  regularity  described  by  cardiologists 
occurs  at  a  more  detailed  level  of  abstraction,  perhaps  in  the  recurrence  of  diagnostic  syn¬ 
dromes  comprised  within  the  cases.  This  more  fine-grain  uniformity  can  be  exploited  only 
by  analyzing  precedents,  rather  than  by  applying  them  in  their  entirety. 

3.5.5  User  Modeling  for  Medical  Dialogue  Systems  (Ira  Haimowitz) 

We  describe  a  dialogue  system  between  an  expert  system  and  its  users  which  combines 
two  recent  hypotheses.  First,  that  the  dialogue  system  should  explicitly  model  both  the 
person  directly  interacting  with  the  dialogue  system  (the  agent)  and  the  person  reasoned 
about  by  the  expert  system  (the  patient)  in  order  to  communicate  meaningfully  with  both 
people.  Second,  that  a  dialogue  system  can  model  the  domain-related  beliefs,  preferences 
and  concerns  of  both  its  users  and  generate  responses  empathetic  to  both. 

This  dialogue  system  is  called  SERUM,  standing  for  “System  for  Empathetic  Responses  with 
User  Models.”  SERUM  generates  natural  language  responses  about  attribute  values  of  domain 
objects  via  three  transformations.  First,  the  system  converts  properties  of  the  agent  and 
patient,  and  domain  knowledge,  into  a  pragmatic  objectives  like  empathy.  Second,  SERUM 
converts  the  pragmatic  objectives  into  surface  structure  cues,  like  object  emphasis  and  level  of 
technicality.  Finally,  SERUM  converts  the  surface  structure  cues  to  realize  text  that  is  natural, 
appropriately  technical,  and  downplaying  or  offsetting  information  that  is  unpleasant  or 
undesirable  to  the  agent  or  patient.  SERUM  is  demonstrated  in  the  medical  domain  of 
lung  disease  for  AIDS  patients,  a  sensitive  domain  where  empathetic  responses  can  be  quite 
important. 

3.5.0  Temporal  Control  Structure  (Thomas  Russ) 

Thomas  Russ  continued  the  development  of  the  Temporal  Control  Structure,  an  expert 
systems  development  shell  for  the  construction  of  time  dependent  monitoring  systems.  The 
major  improvements  were  improvements  in  the  run  time  efficiency  of  the  implementation, 
the  creation  of  development  tools  for  application  program  development,  and  the  addition  of 
specialized  modules  that  provide  temporal  abstractions  of  data  such  as  dynamic  calculations 
of  fluid  and  electrolyte  balance.  Agendas  for  handling  asynchronous  events  were  also  added 
to  the  system. 

Mr.  Russ  is  currently  extending  that  development  and  constructing  a  prototype  application 
in  the  acute  care  of  diabetic  ketoacidosis.  This  prototype  will  provide  ongoing  management 
advice  for  insulin,  fluid,  and  electrolyte  therapy  in  the  acute  phase  of  diabetic  ketoacidosis. 
Following  development,  retrospective  medical  record  trials  will  be  conducted  during  June 
and  July. 

In  the  past  year  he  explored  the  role  of  hindsight  in  clinical  decisions  and  implemented  a 
demonstration  program  showing  how  such  reasoning  is  supported  by  the  Temporal  Control 
Structure  [257].  The  implementation  of  hindsight  was  accomplished  through  the  use  of 
dependency-directed  updating  and  mechanisms  for  passing  information  both  forwards  and 
backwards  along  the  time  line. 


35 


Clinical  Decision  Making 

3.5.7  Symptom  Clustering  (Thomas  Wu) 

Thomas  Wu’s  research  deals  with  a  new  representation  and  algorithm  based  on  symptom 
clustering  for  diagnosing  multiple  disorders.  The  symptom  clustering  approach  partitions 
symptoms  into  causal  groups,  in  contrast  to  the  existing  candidate  generation  approach, 
which  assembles  sets  of  disorders,  or  candidates.  In  other  words,  the  candidate  generation 
approach  explores  ways  to  put  disease  hypotheses  together;  the  symptom  clustering  approach 
explores  ways  to  put  symptomatic  evidence  together. 

Symptom  clustering  achieves  efficiency  by  generating  aggregates  of  candidates  rather  than  in¬ 
dividual  candidates,  and  by  representing  them  implicitly  in  a  cartesian  product  form.  Search 
criteria  of  parsimony,  subsumption,  and  spanning  can  help  narrow  the  symptom  clustering 
search  space.  A  problem-reduction  search  algorithm  has  been  devised  to  explore  this  space 
efficiently.  Experimental  results  on  a  large  knowledge  base  indicate  that  symptom  clustering 
yields  a  near- exponential  increase  in  performance  compared  to  candidate  generation.  For 
example,  some  complex  cases  that  require  several  hours  to  solve  using  the  candidate  gen¬ 
eration  approach  can  now  be  solved  in  a  matter  of  seconds  using  the  symptom  clustering 
approach. 

In  addition  to  this  theoretical  foundation,  some  preliminary  work  on  probabilistic  evalua¬ 
tion  of  symptom  clusterings  has  been  completed.  Future  research  will  investigate  heuristic 
guides  for  symptom  clustering  algorithms,  including  syndromic  knowledge,  which  seems  to 
be  important  in  human  clinical  cognition. 

3.5.8  Decision  Models  (Tze-Yun  Leong) 

Characterizing  the  knowledge  involved  in  decisions  illuminates  the  representational  and  com¬ 
putational  requirements  for  the  decision-analytic  approach  to  automated  clinical  decision 
making.  This  work  analyzes  the  medical  knowledge  required  for  formulating  decision  models 
in  the  domain  of  pulmonary  infections  (Pis)  with  suspected  acquired  immunodeficiency  syn¬ 
drome  (AIDS).  Based  on  the  analysis,  a  knowledge  representation  framework  is  proposed. 
The  framework  is  evaluated  by  showing  how  it  supports  decision  model  formulation  for  an 
example  case. 

Aiming  to  support  dynamically  generated  decision  models,  the  knowledge  characterization 
focuses  on  the  structural  aspects  of  the  decision  problem,  such  as  the  clinical  context,  the 
classes  of  evidence,  hypotheses,  tests,  treatments,  outcomes,  and  the  behavioral  relationships 
among  them.  Concepts,  which  model  the  objects,  states,  processes,  and  their  attributes  in 
the  clinical  setting,  are  the  basic  building  blocks  of  the  representation  design.  A  language 
with  set-theoretic  and  probabilistic  semantics  is  devised  to  describe  the  concepts  and  their 
context-independent  and  context-dependent  relationships. 
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4.1  Introduction 


Our  group  is  interested  in  general  purpose  parallel  computation.  Our  approach  is  centered 
on: 

•  Declarative,  implicitly  parallel  languages. 

•  Dataflow  architectures,  which  are  scalable  because  of  their  tolerance  of  increased  mem¬ 
ory  latencies  and  support  for  frequent  synchronization.  Our  vehicles  for  research  in¬ 
clude  an  abstract  “Explicit  Token  Store”  architecture  (ETS),  hardware  prototype  im¬ 
plementations  of  ETS  (called  Monsoon),  various  software  emulators  (GITA,  MINT), 
and  a  software  emulator  for  a  new  proposed  architecture  called  P-RISC. 

•  Sophisticated  compiling  and  run-time  systems  for  Id,  both  for  dataflow  and  other 
architectures.  We  have  also  explored  the  use  of  dataflow  compiling  for  an  experimental 
persistent  programming  language. 

•  Applications  programs  to  guide  the  language,  compiler,  and  architecture  research. 

Last  year,  we  reported  that  we  began  negotiations  with  Motorola  for  a  project  to  produce, 
as  a  research  prototype,  a  complete  system  running  Id  on  dataflow  machines  using  the 
Monsoon  processor  architecture.  This  year,  MIT-Motorola  cooperation  has  moved  into  high 
gear.  This  involves  extensive  and  daily  cooperation  in  the  design  and  production  of  the 
Monsoon  hardware,  and  in  the  design  and  production  of  the  Id  programming  environment. 
Two-node  prototypes  are  expected  by  the  end  of  summer  1990,  and  16-node  machines  by 
spring  1991.  To  this  end,  a  formal  cooperation  agreement  has  been  signed,  and  Motorola 
has  established  and  staffed  the  new  Motorola  Cambridge  Research  Center  at  One  Kendall 
Square,  next  door  to  LCS. 

Our  main  research  vehicle  for  programming  languages  is  Id,  which  has  fine-grained,  implicit 
parallelism.  We  have  been  able  to  formalize  our  incremental  typing  system  for  Id  and  to  prove 
it  correct.  We  have  made  much  progress  in  developing  the  “manager”  construct  in  Id,  which 
is  a  disciplined  way  of  using  imperatives  while  retaining  fine-grained,  implicit  parallelism  and 
synchronization.  We  have  continued  our  work  in  formalizing  Id’s  operational  semantics  in 
terms  of  abstract  reduction  systems.  New  applications  in  Id  include  the  Traveling  Salesman 
Problem  using  simulated  annealing,  the  Viterbi  Search  from  speech  recognition  systems,  and 
various  sparse  matrix  algorithms. 

We  have  made  almost  a  complete  transition  from  the  TTDA  (Tagged  Token  Dataflow  Archi¬ 
tecture)  compilation  schemas  to  new  schemas  that  incorporate  the  notion  of  frame  storage 
in  an  integral  way.  Frame  storage  is  now  used  for  extensive  loop  optimizations.  A  new  back¬ 
end  translates  these  frame-oriented  dataflow  graphs  into  code  for  Monsoon.  We  have  been 
studying  resource  management  for  Id  in  great  detail,  including  compiler-directed  garbage 
collection,  as  well  as  numerous  versions  of  frame  and  heap  managers  for  improved  concur¬ 
rency. 

The  porting  of  Id  World  to  the  UNIX  environment  (from  our  original  Lisp  Machine  environ¬ 
ment)  is  complete,  and  has  been  distributed  outside  MIT. 
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The  Monsoon  wire-wrap  prototype,  which  has  now  been  running  for  over  a  year,  has  been 
invaluable  for  testing  our  ideas  in  resource  management  in  Id,  for  measuring  instruction 
mixes,  and  for  designing  its  successor. 

The  second  generation  Monsoon  processor  has  been  designed  using  various  ASICs.  Motorola 
has  done  the  board-level  design  and  is  fabricating  it.  The  processor  incorporates  substantial 
improvements  from  the  wire-wrap  prototype  in  speed,  functionality,  and  connection  to  the 
UNIX  world.  An  I-structure  board  has  also  been  designed  and  is  being  fabricated  by  Mo¬ 
torola.  The  Monsoon  interconnection  network  switching  chips  (PaRCs)  and  data  link  chips 
(DLCs)  have  been  designed  and  fabricated,  and  are  undergoing  testing.  A  4-by-4  network 
board  has  been  designed  by  Motorola  and  is  being  fabricated. 

In  other  work:  Vinod  Kathail  has  completed  his  Ph.D.  thesis  on  optimal  interpreters  for  the 
lambda-calculus;  we  have  continued  our  research  on  P-RISC,  a  synthesis  of  von  Neumann 
and  dataflow  architectures;  we  are  close  to  having  a  stock  hardware  implementation  of  Id; 
and,  we  are  close  to  having  a  dataflow  implementation  of  a  parallel  persistent  language. 

In  addition  to  cooperation  with  Motorola,  we  continue  to  maintain  strong  and  active  contacts 
with  several  other  dataflow  researchers  outside  MIT.  Members  of  our  group  have  participated 
in  the  international  committee  that  designed  the  new,  standard,  functional  programming 
language  Haskell. 

4.2  Personnel  and  Visitors 

In  January  1990,  Greg  Papadopoulos  was  appointed  to  the  MIT  faculty  in  the  Department 
of  Electrical  Engineering  and  Computer  Science.  He  has  been  a  member  of  our  research  staff 
since  August  1988. 

In  December,  Arthur  Altman  of  Texas  Instruments  completed  a  year  as  a  visitor  in  our 
group,  and  has  transferred  to  Steve  Ward’s  Computer  Architecture  Group. 

Rudiger  Kreuter  from  Siemens,  Germany,  spent  the  Fall  of  1990  as  a  visitor  in  our  group.  In 
addition  to  learning  about  Id  and  dataflow,  he  studied  the  implementation  of  3D  graphics 
in  Id. 

As  usual,  we  have  had  a  steady  stream  of  international  scholars  for  short  visits  and  talks. 

4.3  MIT-Motorola  Collaboration  on  Id  and  Monsoon 

Through  the  concerted  efforts  of  Albert  Vezza,  Associate  Director  of  LCS,  and  Jerzy  Skibin- 
ski,  Vice  President  of  Motorola’s  Microcomputer  Division,  a  joint  Research  Agreement  with 
Motorola’s  Computer  Division  of  Tempe,  Arizona  was  formalized  in  August  1989,  although 
cooperation  had  been  ongoing  for  seven  months  in  anticipation  of  the  signing. 

The  joint  effort  will  result  in  at  least  three  16-node  Monsoon  research  prototypes  and  at  least 
sixteen  2-node  versions.  The  division  of  labor  between  MIT  and  Motorola  is  as  follows:  MIT 
is  responsible  for  the  overall  system,  logic  and  chip  designs,  chip  fabrication,  a  novel  special 
tool  for  generating  microcode  from  opcode  specifications,  the  Id  language,  and  compiler 
design  and  development.  Motorola  is  responsible  for  all  board-level,  enclosure,  supporting 


41 


Computation  Structures 

hardware  infrastructure  and  I-structure  logic  design,  development,  and  manufacturing.  On 
the  software  side,  Motorola  is  responsible  for  the  Monasm  assembler,  dynamic  linking  loader, 
command  line  interpreter  user  interface,  all  host  level  software,  and  debugging  tools  including 
a  Monsoon  simulator. 

Motorola’s  project  is  managed  by  Jim  Richie.  Their  hardware  work  is  done  at  their  facility 
in  Arizona,  while  their  software  work  is  done  mostly  in  Cambridge  at  the  new  Motorola 
Cambridge  Research  Center  (MCRC),  which  they  have  established  as  part  of  this  project. 
The  immediate  focus  of  MCRC,  which  is  in  the  Kendall  Square  office  complex  next  to  LCS, 
is  close  cooperation  with  MIT  in  the  research  and  development  of  software  for  Monsoon.  In 
the  long  run,  MCRC  is  expected  to  take  its  place  alongside  the  many  fine  basic  research 
labs  in  the  vicinity  of  MIT.  The  first  employee  of  MCRC  was  Ken  Traub,  who  completed  his 
Ph.D.  in  our  group  in  1988.  Ken  was  the  original  architect  and  builder  of  our  Id  compiler. 
As  a  member  of  MCRC,  Ken  is  playing  a  leading  role  as  overall  architect  of  the  Monsoon 
software  system. 

During  the  year,  we  have  held  numerous  review  and  planning  meetings  with  Motorola: 

•  August  1,  1989:  Review  meeting  at  MIT. 

•  September  27-28,  1989:  Software  and  contracts  meeting  at  MIT. 

•  October  19,  1989:  Review  meeting  at  MIT. 

•  December  7,  1989:  Monsoon  technical  discussion  meeting  at  MIT. 

•  January  25-27,  1990:  Monsoon  hardware  and  software  progress  review  meeting  at 
Motorola,  Tempe,  AZ. 

•  March  29-30,  1990:  Review  meeting  at  MIT. 

•  June  25,  1990:  Monsoon  hardware  and  software  progress  review  meeting  at  Motorola, 
Tempe,  AZ. 


We  are  happy  to  report  that  all  critical  milestones  to  date  have  been  met.  We  expect  that 
the  first  2-node  prototype  will  be  -<  .ailable  during  the  third  quarter  of  1990,  and  the  first 
16-node  prototype  during  the  second  quarter  of  1991. 


4.4  Other  External  Collaborations 

Our  work  on  Id  and  Monsoon  has  led  to  collaborative  efforts  with  many  research  groups 
outside  MIT. 

Upon  leaving  MIT  after  finishing  his  Ph.D.  thesis,  Bob  Iannucci  has  started  the  Empire 
Project  at  IBM  Research,  whose  goal  is  to  build  a  hybrid  dataflow-von  Neumann  machine 
similar  to  the  one  he  proposed  and  studied  here  in  his  thesis.  We  also  continue  to  collaborate 
with  K.  Ekanadham  of  IBM  Research,  who  is  leading  the  effort  to  target  our  Id  compiler  for 
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that  machine.  During  the  summer  of  1989,  Shail  Aditya  worked  at  IBM  Research  on  their 
Id  compiler. 

At  Sandia,  a  group  of  researchers  led  by  Gerald  Grafe  is  building  the  Epsilon  dataflow 
machine  which  is  similar  to  Monsoon  in  many  respects.  Jamey  Hoch  of  Sandia  is  working  on 
retargeting  our  Id  compiler  for  Epsilon.  In  addition,  they  will  be  using  our  PaRC  network 
switching  chip  in  the  interconnection  network  for  their  multiprocessor.  Ken  Steele  has  just 
left  MIT  to  join  the  group  at  Sandia.  We  have  participated  in  several  meetings  to  discuss 
collaboration  with  the  Sandia  project: 

•  July  31,  1989:  Sandia-DARPA  meeting  in  Washington  D.C. 

•  March  11-13,  1990:  Cooperation  meeting  at  Sandia,  Albuquerque,  NM. 

Karl  Ottenstein  of  Los  Alamos  and  Bob  Ballance  of  the  University  of  New  Mexico  are  working 
on  a  compiler  for  FORTRAN  on  Monsoon;  they  plan  to  use  the  backend  of  our  Id  compiler. 
Similarly,  Keshav  Pingali  of  Cornell  is  also  investigating  the  implementation  of  imperative 
languages  on  a  dataflow  machine — he,  too,  plans  to  use  the  backend  of  our  Id  compiler  in 
order  to  run  his  codes  on  Monsoon. 

In  a  separate,  but  related  activity,  Arvind,  Rishiyur  Nikhil,  and  Jonathan  Young  were  mem¬ 
bers  of  the  design  team  for  Haskell,  the  new,  nonstrict  functional  programming  language. 
The  team  included  about  15  prominent  researchers  in  functional  programming  from  the  U.S. 
and  Europe.  The  report  on  Haskell  was  published  in  April  1990.  It  is  hoped  that  the  interna¬ 
tional  research  community  will  adopt  this  language  as  the  standard  for  nonstrict  functional 
languages. 

On  November  1-3,  1989,  we  held  a  Dataflow  Workshop  here  at  MIT.  In  addition  to  re¬ 
searchers  from  all  the  above  groups,  participants  included  recent  graduates  from  our  group, 
and  researchers  from  Yale,  Manchester  University,  Tera  Computers,  Rice  University,  Oregon 
Graduate  Institute,  Glasgow  University,  Motorola,  and  Hewlett  Packard  Labs. 

On  April  26-27,  1990  we  held  a  Software  Cooperation  Meeting  here  at  MIT,  again  attended 
by  researchers  from  most  of  the  above  groups.  The  focus  was  on  discussing  how  each  of  us 
could  structure  our  work  to  maximize  sharing,  since  many  of  us  are  interested  in  targeting 
other  languages  to  Monsoon  and  in  targeting  Id  to  other  machines. 

One  of  the  outcomes  of  the  Software  Cooperation  Meeting  was  a  consensus  among  our  guests 
that  we  needed  to  run  a  workshop  dedicated  to  furthering  the  understanding  of  the  internal 
structure  of  the  Id  compiler.  This  workshop  has  been  scheduled  for  June  28-29,  1990  at 
MIT. 

On  February  2,  1990,  we  held  an  internal  (MIT)  workshop  on  multithreaded  architectures 
with  participants  from  our  group  and  the  groups  led  by  Anant  Agarwal,  Steve  Ward,  Bill 
Dally,  Tom  Knight,  as  well  as  Bert  Halstead  from  DEC  Cambridge  Research  Center.  The 
intent  was  to  get  a  better  understanding  of  each  others’  work,  since  all  are  exploring  different 
kinds  of  multithreaded  architectures. 

As  usual,  on  July  24-28,  1989,  the  summer  dataflow  course  (6.83s),  was  taught  here  at  MIT 
by  Arvind  and  Rishiyur  Nikhil.  The  course  was  attended  by  approximately  20  external 
researchers. 
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Also,  in  November  1989,  Arvind  and  Rishiyur  Nikhil  taught  a  one-day  tutorial  on  Id  at  the 
Supercomputing  ’89  Conference  in  Reno,  Nevada. 

4.5  Id:  General  Topics 

4.5.1  Types  and  Incremental  Type-checking 

Continuing  his  work  on  the  incremental  type  inference  system  for  Id,  Shail  Aditya  devel¬ 
oped  an  abstract  model  for  incremental  property  maintenance  and  applied  it  to  show  the 
correctness  of  the  incremental  type  inference  system  developed  for  Id. 

Incremental  programming  environments,  such  as  Lisp,  aim  at  providing  the  user  the  flexibil¬ 
ity  to  write  a  sequence  of  definitions  constituting  the  program,  one  by  one  and  in  arbitrary 
order,  resolving  global  references  to  other  definitions  dynamically.  They  allow  editing  and 
testing  parts  of  an  incomplete  program  or  debug  those  parts  that  are  incorrect,  without  wor¬ 
rying  about  the  status  of  the  rest  of  the  program.  However,  the  Hindley/ Milner  static  type 
inference  system  [227]  [84]  followed  in  Id  does  not  naturally  lend  itself  to  incremental  compi¬ 
lation.  Nikhil  in  [230]  discussed  the  issues  involved  and  outlined  a  high  level  mechanism  to 
do  it. 

Following  Nikhil’s  proposal,  Shail  Aditya  devised  an  abstract  scheme  to  adapt  the  Hind- 
ley/Milner  type  inference  system  for  incremental  compilation.  Subtle  incremental  interac¬ 
tions  were  discovered  between  the  types  of  a  given  set  of  definitions  and  their  partitioning 
into  strongly  connected  components  (SCC),  definitions  that  are  mutually  recursive  with 
each  other.  Development  of  the  necessary  theoretical  framework  guided  modifications  in  the 
scheme  to  handle  polymorphic  and  mutually  recursive  definitions  correctly.  Essentially,  the 
present  scheme  consists  of  maintaining  an  upper  and  a  lower  type  bound  for  each  top  level 
identifier  along  with  its  current  SCC.  Inconsistencies  arising  due  to  the  declared  type  falling 
out  of  the  expected  range,  or  because  of  a  change  in  its  SCC,  are  detected  and  the  affected 
definitions  are  flagged  for  recompilation.  The  goal  is  to  show  an  exact  correspondence  be¬ 
tween  the  types  inferred  in  the  incremental  scheme  with  those  inferred  when  a  complete  and 
correct  program  is  given,  while  at  the  same  time  performing  minimal  recompilation  work 
due  to  an  incremental  change  in  the  program.  Tne  detailed  proofs  of  the  correspondence  are 
due  to  appear  in  Shail  Aditya’s  forthcoming  master’s  thesis.  Future  work  in  this  direction 
will  be  to  optimize  the  space  and  time  requirements  of  the  incremental  bookkeeping  done  by 
the  compiler. 

4.5.2  Managers 

Paul  Barth  and  Rishiyur  Nikhil  continued  their  work  on  managers.  Managers  add  non- 
determinism  to  Id,  an  important  property  for  state-sensitive  computation,  as  required  by 
operating  systems,  databases,  and  I/O.  Although  nondeterminism  is  a  powerful  feature,  it. 
introduces  a  new  class  of  programming  errors:  irreproducible  results.  We  are  addressing  this 
at  the  language  level  by  encapsulating  managers  in  abstract  type  definitions.  A  manager  is 
an  abstract  type,  consisting  of  an  updatable  state  and  a  set  of  operations  that  access  and 
update  the  state.  Each  operation  computes  a  new  state  from  the  old  state.  Mutual  exclusion 
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is  provided  for  the  state  so  that  concurrent  operations  do  not  interfere.  This  encapsulation 
allows  state  invariants  to  be  proved  by  proving  them  for  each  operation. 

Several  challenging  efficiency  concerns  are  now  being  addressed.  For  space  efficiency,  man¬ 
agers  with  a  complex  state  should  mutate  the  state  rather  than  copy  it.  For  parallelism,  such 
managers  should  provide  fine-grained  mutual  exclusion,  so  that  independent  operations  can 
proceed  concurrently.  These  concerns  raise  syntactic  and  semantic  issues  in  the  design  and 
implementation  of  managers.  A  new  syntax  has  been  proposed  that  addresses  these  issues 
while  maintaining  a  clean  abstraction. 

Manager  applications  have  been  written  for  graph  algorithms,  sorting,  memory  management, 
union-find,  and  parallel  priority  queues.  The  traveling  salesman  problem  was  coded  using  a 
simulated  annealing  algorithm,  using  managers  for  both  path  mutation  and  random  number 
generation.  A  detailed  study  of  potential  parallelism  and  synchronization  bottlenecks  was 
performed  on  several  variations  of  the  algorithm. 

4.5.3  Sequentialized  Code  Execution 

We  are  currently  experimenting  with  writing  resource  managers  and  operating  systems  in 
Id.  These  kinds  of  programs,  which  make  use  of  imperative  side-effects,  must  have  explicit 
sequentialization  of  reads  and  writes. 

James  Hicks  has  extended  the  Id  language  and  compiler  for  sequential  constructs.  The 
new  syntax  sequentializes  the  execution  of  groups  of  bindings  in  let  blocks  or  loops.  To 
sequentialize  a  group  of  bindings,  use  ‘ft’  instead  of  to  separate  the  bindings,  as  shown  in 
this  example: 

{  xO  =  eO; 

xl  =  el 
ft  x2  =  e2; 

x3  =  e3 
in 

e4  } 

This  ensures  that  the  evaluation  of  e2  and  e3  does  not  begin  until  all  computation  has  ceased 
in  the  previous  two  bindings.  Note,  however,  that  eO  and  el  may  execute  in  parallel,  and 
that  e2  and  e3  may  execute  in  parallel — sequentialization  is  only  inserted  between  binding 
groups  separated  by  ‘ft’. 

Parentheses  may  be  used  to  group  bindings  and  enforce  more  arbitrary  synchronization 
graphs.  Here  is  an  example: 

{  xO  =  eO; 

(  xl  =  el  ft 
x2  =  e2  ) 
in  e3  } 

In  this  example,  eO  may  execute  in  parallel — with  el  and  e2,  but  e2  may  not  begin  execution 
until  expression  el  terminates. 
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4.5.4  Formalization  of  Id’s  Operational  Semantics 

Zena  Ariola  and  Arvind  have  continued  their  work  on  giving  precise  operational  semantics  for 
Id.  The  approach  consists  of  translating  Id  into  a  simpler  and  smaller  kernel  language,  and 
giving  semantics  of  the  kernel  language  in  terms  of  an  Abstract  Reduction  System  (ARS). 
In  order  to  prove  the  correctness  of  compiler  optimizations,  a  notion  of  program  equality  is 
needed.  Such  a  notion  is  easier  to  define  for  an  ARS  than  an  interpreter.  P-TAC,  an  earlier 
attempt  to  define  such  a  language,  and  ARS  were  reported  last  year  [13]. 

P-TAC  was  a  simple  and  a  low  level  language  that  allowed  us  to  capture  most  aspects 
of  the  current  implementation  of  Id  on  the  dataflow  machines.  However,  it  was  so  far 
from  the  source  language  that  the  translation  procedure  (from  Id  to  P-TAC)  became  a 
serious  impediment  in  understanding  the  operational  semantics  of  Id.  Even  though  it  allowed 
many  program  optimizations  to  be  described  in  terms  of  source-to-source  P-TAC  program 
transformations,  it  ruled  out  certain  other  program  optimizations  because  the  information 
to  perform  them  was  essentially  lost  in  the  translation  process. 

Kid  ,  our  current  kernel  language,  is  essentially  a  de-sugared  version  of  Id,  which  is  Id  without 
comprehensions,  general  union  types  and  pattern  matching,  and  nondeterministic  features 
such  as  managers.  Both  array  and  list  comprehensions  can  be  expressed  in  terms  of  other 
Id  features  such  as  loops  and  “open”  lists.  Though  such  a  translation  is  not  simple,  it  can 
be  understood  in  its  own  right.  Similar  remarks  apply  to  complex  .  attern  matching.  The 
stage  at  which  type-checking  should  be  performed  is  still  an  open  question.  Given  the  type 
definitions,  type  checking  can  be  done  at  the  Kid  level  though  it  may  be  profitable  to  do  so 
at  an  earlier  stage. 

An  ARS  for  Kid,  which  includes  nested  function  definitions  and  loops,  has  been  defined. 
Many  loop  optimizations  and  partial  evaluation  have  been  expressed  as  Kid  source-to-source 
transformations.  The  work  on  formalizing  the  printable  output  and  termination  of  Id  pro¬ 
grams  is  underway. 

4.6  Id:  Compiler  and  Run-time  Systems  for  Monsoon 

4.0.1  New  Compilation  Schemas  for  Dealing  With  Frames 

James  Hicks  implemented  the  code  generator  for  the  bounded  loop  schema  for  the  Monsoon 
backend.  The  TTDA  bounded  loop  schema  is  much  different  from  the  Monsoon  bounded  loop 
schema  because  each  iteration  executes  in  a  different  context  on  Monsoon,  while  in  TTDA 
only  the  iteration  number  within  a  context  changed.  This  necessitates  a  change  in  the  D 
operator  that  routes  tokens  from  one  iteration  to  the  context,  or  activation  frame,  of  the  next 
iteration.  Another  change  is  that  the  synchronization  that  allows  only  k  iterations  to  execute 
at  once  must  be  performed  using  locks  and  two-phase  transactions — this  synchronization  was 
performed  with  bit-vectors  and  special  instructions  in  Gita. 

The  new  bounded  loop  schema  consists  of  three  parts:  setup,  iteration,  and  cleanup.  The 
setup  portion  consists  of  a  1-bounded,  or  sequential,  loop  that  allocates  a  ring  of  activation 
frames  and  fills  in  the  loop  constants  in  each  frame.  The  iteration  portion  consists  of  the 
actual  loop  body  plus  the  glue  necessary  for  synchronization  and  to  route  tokens  to  the 
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next  iteration  or  to  the  outputs  of  the  loop.  The  cleanup  portion  of  the  loop  clears  and 
deallocates  each  iteration  context.  We  have  taken  much  care  so  that  the  setup,  iteration, 
and  cleanup  portions  of  the  loop  may  be  overlapped  to  reduce  the  latency  incurred  by  loops 
that  execute  few  iterations.  The  setup  portion  allows  the  iterations  to  begin  as  soon  as  it 
has  one  activation  frame  setup.  When  the  loop  predicate  evaluates  to  false,  the  cleanup 
portion  of  the  code  is  triggered  with  the  continuation  of  the  next  context  in  the  ring,  which 
is  guaranteed  to  be  inactive  at  that  point.  The  cleanup  code  starts  with  that  context,  and 
continues  around  the  ring  with  the  proper  synchronization  to  ensure  that  it  does  not  overrun 
the  loop  body. 

4.0.2  Staging  the  Instruction  Set  Development  for  Monsoon 

The  macroinstruction  set  of  Monsoon  is  a  “soft”  interface,  such  that  an  opcode  is  successively 
decoded  through  the  pipeline  by  downloadable  lookup  tables.  The  decode  tables  are  set  up 
by  a  host  processor  whenever  Monsoon  is  cold-booted.  In  Monsoon,  an  opcode  encodes 
the  effective  address  mode,  the  matching  mode  (e.g.,  join,  constant,  imperative),  the  ALU 
operation,  and  the  number  and  disposition  of  result  tokens.  For  example,  the  double  precision 
floating  point  subtract  operation  FSUB  consumes  32  opcodes  for  all  of  its  variants  of  one  vs. 
two  outputs,  constant  vs.  dyadic  matching,  etc. 

The  software  Monsoon  interpreter,  MINT,  is  also  indirectly  driven  by  the  decode  tables.  A 
preprocessing  program,  MUD,  takes  the  decode  tables  as  input  and,  for  each  opcode,  produces 
a  C  (originally  Lisp)  subroutine  which  is  later  compiled  and  linked  into  MINT. 

In  order  to  manage  the  “bring- up”  and  validation  of  the  compiler,  the  software  which  gener¬ 
ates  the  decode  tables  and  MINT,  we  have  partitioned  the  instruction  set  into  three  subsets 
to  be  developed  in  stages.  The  first  set,  ISO,  has  approximately  60  opcodes  (out  of  a  possible 
2048)  and  represents  a  very  minimal  instruction  set.  ISO  supports  frame  and  I-structure  al¬ 
location,  but  no  deallocation  and  only  single  deferred  readers.  Exceptions  are  not  supported, 
nor  is  the  run-time  type  system  (Id  is  statically  typed). 

The  next  stage,  ISl ,  brings  the  total  to  a  few  hundred  opcodes  and  is  capable  of  supporting 
the  entire  Id  language,  including  closures,  accumulators,  managers,  storage  reclamation,  and 
multiple  deferred  reads  against  I-structures. 

The  final  stage,  IS2 ,  encompasses  the  whole  instruction  set,  including  experimental  exten¬ 
sions  for  temporary  registers  and  threading.  As  of  June  1990,  the  ISO  set  has  been  certified 
by  a  series  of  tests  on  our  gate-level  simulator  of  a  Monsoon  processing  element,  and  an 
ISO  version  of  the  compiler  and  MINT  has  executed  a  simple  successive  over-relaxation  2D 
wavefront  problem. 

4.0.3  New  Backend  for  Monsoon 

This  past  year  Andrew  Shaw  has  implemented  a  new  backend  for  the  ID  compiler  to  trans¬ 
form  ID  program  graphs  into  Monsoon  machine  language.  Prior  to  this,  we  had  been  gen¬ 
erating  code  for  the  Monsoon  wire-wrap  prototype  using  an  interim  backend  that  was  a 
modification  of  the  original  TTDA  (Tagged  Token  Dataflow  Architecture)  backend.  The 
new  backend  uses  the  same  data  structures  as  the  middle  end  of  the  compiler,  and  several 
new  optimizations  have  been  implemented,  along  with  the  standard  peephole  optimizations 
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that  were  in  place  with  the  old  TTDA  backend.  For  example,  the  calling  convention  con¬ 
strains  the  entry  points  of  procedures  to  lie  in  consecutive  address  locations;  one  of  the  new 
modules  can  relax  the  conservative  selection  of  these  instructions.  Since  the  Monsoon  archi¬ 
tecture  has  some  assembly  constraints  on  the  layout  of  machine  code,  new  algorithms  were 
designed  to  enforce  these  constraints  upon  the  final  output  code.  For  example,  two-output 
instructions  are  constrained  to  have  one  of  their  destination  instructions  in  the  following  in¬ 
struction  slot;  a  new  bipartite-matching  algorithm  was  implemented  to  find  a  near-optimum 
selection  of  successor-constrained  instructions. 

In  addition,  an  interim  loader  was  implemented  to  interface  with  the  new  Monsoon  Inter¬ 
preter,  since  the  full  loader  has  not  yet  been  implemented. 

4.6.4  Compiler-directed  Storage  Reclamation  for  Id 

We  have  been  experimenting  with  structure-storage  management  over  the  past  year.  Informal 
studies  have  shown  that  functional  languages  typically  “cons”  at  four  times  the  rate  of  Lisp 
programs.  This  high  rate  of  storage  allocation  means  that  functional  programs  have  a  great 
dependency  on  garbage  collection.  Unfortunately,  garbage  collection  can  be  very  expensive, 
especially  on  a  parallel  machine. 

We  have  introduced  a  pragma,  ©Release,  into  Id  to  annotate  structures  that  are  temporary 
and  that  should  be  deallocated.  When  the  Id  compiler  sees  an  ©Release  annotation  on  a 
structure,  it  inserts  code  to  deallocate  the  structure  upon  termination  of  the  nearest  enclosing 
conditional  branch,  procedure  body,  or  loop  iteration. 

There  is  one  further  optimization  the  compiler  can  perform  with  ©Releases.  If  a  structure 
is  allocated  in  a  loop,  and  deallocated  in  the  same  or  next  iteration,  then  the  compiler  can 
lift  the  allocate  and  deallocate  out  of  the  loop  to  reduce  the  overhead  of  calls  to  the  storage 
manager.  Outside  the  loop,  k  copies  of  the  structure  will  be  allocated,  where  k  is  the  loop 
bound.  These  will  be  used  by  the  iterations  of  the  loop.  After  the  loop  terminates,  all  k 
structures  will  be  deallocated. 

The  ©Release  pragma  has  been  used  extensively  by  Olaf  Lubeck  in  his  Id  implementations 
of  the  Gamteb  photon  transport  benchmark  and  the  Particle-in-Cell  (PIC)  code.  In  October 
1989,  Olaf  Lubeck,  James  Hicks  and  Paul  Johnson  got  Gamteb  to  run  on  the  Monsoon 
prototype.  This  version  of  Gamteb  was  annotated  so  that  it  did  not  leak  any  storage — all 
structures  that  became  garbage  were  deallocated.  The  largest  problem  run  on  the  prototype 
started  with  40,000  particles.  It  allocated  300,000  9-tuples,  200,000  3-tuples,  and  270,000 
activation  frames  of  size  512.  When  it  completed,  only  616  words  of  storage  were  still 
allocated — and  that  contained  the  answer.  This  is  quite  impressive  considering  that  the 
prototype  only  has  128K  words  of  memory,  and  only  half  of  that  is  used  for  the  heap.  This 
work  has  shown  that  explicit  structure-storage  management  is  useful;  it  allows  us  to  run 
programs  that  could  not  be  run  otherwise. 

James  Hicks  is  working  on  compiler  analysis  for  the  verification  and  automatic  insertion  of 
©Releases  in  his  Ph.D.  research.  The  goal  of  this  work  is  to  have  the  compiler  analyze 
programs  to  determine  the  lifetime  of  structures,  and  to  insert  code  to  deallocate  struc¬ 
tures  that  are  no  longer  needed.  The  compiler  performs  lifetime  analysis  by  using  abstract 
interpretation — it  interprets  the  program  over  an  abstracted  value  domain  at  compile  time 
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in  order  to  determine  which  expressions  in  the  program  allocate  structures,  and  where  those 
structures  may  be  used. 

In  scientific  codes,  which  tend  to  have  very  regular  control  and  data  flow,  most  (if  not  all) 
of  the  structures  that  become  garbage  should  be  detected  and  deallocated  by  the  compiler. 
Hicks  has  performed  some  experiments  with  simple  program  analyzers  that  support  this 
belief.  His  analyzer  detected  23  of  the  25  ©Releases  inserted  into  Gamteb  by  Lubeck.  The 
analyzer  also  was  run  on  Simple;  in  this  case  the  compiler  inserted  deallocates  that  at  run 
time  deallocated  85%  of  the  garbage  created  by  four  iterations  of  Simple  on  a  10  by  10  grid. 
By  the  end  of  fall  1990,  the  compiler  should  be  able  to  insert  code  to  deallocate  all  of  the 
garbage  created  by  this  program. 

4.0.5  Run-time  Systems  for  Id 

Jonathan  Young  has  led  a  major  effort  on  developing  the  Id  Run-time  System  (RTS).  The 
RTS  is  designed  to  be  a  flexible  interface  to  the  low  level  primitives  needed  to  execute  Id 
programs.  The  RTS  is  composed  of  three  main  parts:  the  allocation  and  deallocation  of 
contexts ,  both  fixed-  and  variable-size,  for  procedure  invocations;  the  allocation  and  deallo¬ 
cation  of  aggregates  for  data  structures;  and  the  management  of  input  from  and  output  to 
the  outside  world.  Most  of  our  work  this  year  has  focused  on  the  first  two  parts. 

As  a  stopgap  measure  until  we  have  a  simulator  capable  of  executing  SVC  trap  instructions, 
we  have  coded  primitive  heap  and  context  allocators  for  the  new  Monsoon  machine  which 
maintain  two  pointers  to  the  beginning  and  end  of  the  heap.  Since  they  do  not  reuse 
storage,  when  the  pointers  cross,  the  machine  dies.  These  allocators  are  generated  inline  by 
the  compiler,  and  have  allowed  the  rest  of  the  software  development  to  proceed. 

We  expect  that  as  IS  1  (instruction  set  level  1,  including  SVC  instructions)  becomes  op¬ 
erational  on  the  simulator,  we  will  be  able  to  test  and  debug  the  full  functionality  of  the 
two-phase  operations  and  the  exception  mechanism.  Once  this  happens,  the  Id  RTS  can 
begin  to  execute  via  SVC  instructions,  although  most  of  the  RTS  handlers  will  simply  make 
a  procedure  call  at  this  point.  When  registers  are  finally  added  to  the  simulator  (IS2),  we 
expect  that  the  RTS  will  become  much  more  efficient. 

There  are  several  problems  to  be  addressed  in  order  to  manage  storage  efficiently  on  a 
Monsoon  multiprocessor.  In  particular,  we  must  avoid  network  traffic — most  of  the  manager 
code  must  be  completely  local.  We  also  wish  to  ensure  that  the  critical  sections  are  short 
and  the  reuse  of  memory  is  as  high  as  possible. 

On  the  new  Monsoon  machine,  each  processor  will  allocate  contexts  locally,  and  using  the 
exception  mechanism,  each  thread  must  be  able  to  allocate  a  context  independent  of  the 
other  threads  in  the  pipeline.  We  have  designed  and  implemented  (but  not  debugged)  a 
scheme  which,  on  average,  achieves  this  behavior  by  caching  a  small  number  (16)  of  contexts 
with  each  thread  while  linking  the  rest  into  a  processor-global  context  free  list. 

Under  this  scheme,  each  allocation  (and  deallocation)  of  fixed-sized  contexts  will  take  ap¬ 
proximately  six  instructions  (exception,  load  cache  pointer,  return  fetched  context  to  caller, 
increment  pointer,  and  store  pointer)  normally  and  20  instructions  for  the  exceptional  case 
that  the  free  (or  empty)  list  has  over-  or  underflowed.  Since  this  happens  statically  once 
every  16  operations,  the  amortized  cost  averages  out  to  no  more  than  seven  instructions. 
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On  the  new  architecture,  we  aim  to  solve  several  new  problems  when  allocating  objects  from 
the  global  heap.  First,  the  heap  will  be  interleaved  across  multiple  nodes  in  the  system. 
While  no  work  is  needed  to  achieve  this  interleaving,  the  heap  manager  will  need  to  create 
pointers  which  take  full  advantage  of  the  hardware  interleaving  mechanism. 

Second,  multiple  processors  will  be  handling  multiple  simultaneous  heap  requests.  Even 
though  the  heap  is  remote,  we  desire  to  handle  the  majority  of  all  requests  locally.  This 
requires  some  PE-local  heap  data  structures.  Finally,  we  wish  to  avoid  interference  between 
different  threads  executing  in  parallel  on  the  same  PE. 

The  heap  manager  on  the  Monsoon  multiprocessor  is  a  hybrid  of  two  schemes.  Each  processor 
will  run  a  local  allocator ,  a  version  of  quick-fit  which  utilizes  the  assumption  that  if  an  object 
is  deallocated,  it  is  likely  that  another  object  of  the  same  size  will  be  allocated.  When  the 
local  allocator  runs  out  of  memory,  however,  it  will  ask  the  global  allocator  for  more  storage, 
pre-allocating  a  large  block  of  memory  for  future  requests.  While  allocating  storage  on  the 
global  heap  does  require  network  traffic,  this  should  be  tolerable  because  it  is  so  infrequent. 

Arun  Iyengar  has  also  become  actively  involved  in  designing  the  Id  storage  managers. 

4.7  Applications 

Paul  Barth  and  Stephen  Brobst  implemented  a  number  of  approaches  to  parallel  simulated 
annealing  for  the  traveling  salesman  problem.  Manager  extensions  to  the  Id  programming 
language  were  used  to  facilitate  the  implementation  of  critical  sections  while  performing 
update-in-place  operations  on  the  adjacency  matrix  of  cities  in  the  algorithm.  A  number  of 
paradigms  for  the  use  of  managers  in  implementing  the  algorithm  were  explored:  Compare 
and  Swap,  Canonical  Ordering,  Master  Lock,  and  Locking  with  Back-Off.  It  was  found 
that  fine-grained  parallelism  was  exposed  very  naturally  in  the  model  of  execution  provided 
by  dataflow.  However,  it  was  found  that  there  is  significant  coarse-grain  sensitivity  within 
the  program  to  the  particular  algorithm  implemented  for  managing  critical  sections.  It  was 
important  to  not  over-serialize  execution  of  loops  corresponding  to  different  temperatures  in 
the  simulated  annealing  algorithm.  However,  it  was  also  critical  to  consider  the  contention 
resulting  from  too  many  parallel  loop  iterations. 

This  contention  resulted  from  two  sources.  One  source  was  the  contention  for  the  current  seed 
value  of  the  random  number  generator.  This  problem  was  addressed  by  initiating  multiple, 
parallel  random  number  generators.  The  other  source  of  contention  was  on  the  cities  to  be 
swapped.  In  general,  to  swap  two  cities,  it  is  required  to  grab  locks  on  six  cities  (the  two 
to  be  swapped  as  well  as  the  two  neighbors  for  each  city).  The  methods  of  managing  this 
locking  had  a  large  impact  on  performance.  As  expected,  any  use  of  global  locks  introduced 
a  major  synchronization  bottleneck  for  large  instances  of  the  problem.  By  optimizing  the 
“fast  path”  of  execution  through  the  locking  primitives,  we  were  able  to  reduce,  but  by  no 
means  eliminate,  the  impact  upon  the  critical  path  of  our  computation. 

Amin  Salaam  and  Rishiyur  Nikhil  have  been  studying  an  implementation  of  the  Viterbi 
Search  in  Id.  Viterbi  Search  is  a  key  component  of  speech  recognition  systems.  The  inputs 
to  the  search  are  an  Acoustic-Phonetic  network  (APnet)  and  a  dictionary.  The  APNet  is  a 
graph  generated  by  an  earlier  phase  that  performs  signal  processing  on  the  acoustic  signal 
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of  the  speech  utterance.  Each  arc  in  the  graph  represents  a  time  interval  in  the  utterance, 
and  is  labeled  by  a  list  of  probabilities,  indicating  how  well  the  signal  in  that  time  interval 
matched  each  of  all  possible  phonemes.  The  dictionary  is  also  a  graph  structured  around 
words.  Each  word  is  represented  by  a  sequence  of  arcs  corresponding  to  phonemes,  and 
words  that  can  legally  appear  in  sequence  are  also  connected  by  arcs.  Dictionary  arcs  are 
also  labeled  with  probabilities  indicating  possible  omission  of  insertion  in  an  actual  utterance. 
The  Viterbi  Search  algorithm  matches  paths  in  the  APnet  to  paths  in  the  dictionary,  finally 
emitting  the  “most  likely”  sequence  of  words  in  the  utterance.  Because  of  this  complex  graph 
structure,  the  algorithm  has  not  been  effectively  parallelized  to  date.  We  have  been  studying 
existing  code  (in  C  and  Scheme)  for  this  algorithm  since  March  1990  in  order  to  extract 
the  fundamental  aspects  of  the  algorithm.  We  plan  to  produce  a  parallel  implementation 
in  Id  and  to  run  it  on  Monsoon.  We  expect  that  this  will  shed  much  light  on  dataflow 
implementations  of  graph  algorithms  in  general. 

Other  applications  written  by  Rishiyur  Nikhil  and  Arvind  in  Id  include:  a  simulator  for  a  very 
abstract  model  of  an  Explicit  Token  Store  (ETS)  dataflow  machine  (of  which  Monsoon  is  a 
concrete  example),  and  various  versions  of  LU-decomposition  for  dense  and  sparse  matrices. 


4.8  Id  World,  the  Id  Programming  Environment 

Development  of  Id  World,  the  Id  programming  and  experimentation  environment,  continues 
with  a  focus  on  advances  which  benefit  both  the  old  TTDA/ Gita  system  and  the  software 
system  which  will  support  Monsoon.  Paul  Johnson  has  worked  on  improvements  in  Id  World 
on  UNIX  workstations  and  software  system  construction  tools.  Id  Mode  for  Gnu  Emacs  pro¬ 
vides  source  code  indentation  and  compilation  of  Id  programs.  With  the  assistance  of  Hicks, 
simulator  statistics  graphs  and  overlays  of  multiple  graphs  are  available  under  the  X  Window 
System.  Id  World  Version  4.3,  which  was  released  in  April,  provides  these  improvements  and 
Id  compiler  pragmas  for  explicit  structure  storage  management.  As  of  Version  4.1,  Id  World 
runs  on  UNIX  workstations  under  Common  Lisp.  Version  4.3  has  been  tested  under  Allegro 
Common  Lisp  and  Lucid  Common  Lisp  running  on  Mips,  Motorola,  and  Sun  workstations. 
We  were  unsuccessful  in  running  Id  World  under  Austin-Kyoto  Common  Lisp  (AKCL)  due 
to  deficiencies  in  language  support  for  error  handling.  Improvements  in  software  system 
construction  tools  include:  handling  file  system  specifics — translation  of  program  filenames 
and  logical  pathname  support  in  the  Defprogram  facility,  startup  initializations  and  banner 
customization  for  Lucid  and  Allegro  disk  images,  consistent  versions  of  internal  software — 
generation,  and  use  of  generic  patch  files  for  our  development  systems. 

4.9  Monsoon  Hardware  Development 

4.9.1  Monsoon  Wire-wrap  Prototype  Processing  Element 

The  Id  compiler  now  produces  Monsoon  object  code  for  all  of  the  Id  language.  The  only 
restriction  to  running  an  Id  program  on  the  prototype  is  the  size  of  memory — the  prototype 
has  only  128K  words  of  data  memory,  of  which  half  is  used  for  I-structure  storage  and  half 
for  activation  frame  memory. 
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We  have  a  run-time  system,  written  in  Id  and  Monsoon  assembly  code  by  Young  and  Hicks, 
that  manages  the  allocation  and  deallocation  of  activation  frames  and  heap  storage  on  the 
Monsoon  processor.  Activation  frames  are  managed  on  a  free  list,  and  heap  storage  is 
managed  by  either  the  buddy  or  first-fit  system.  Implementation  of  the  storage  managers 
has  been  very  difficult  because  concurrent  calls  to  the  storage  manager  can  be  interleaved 
on  an  instruction-by-instruction  basis. 

The  prototype  is  currently  used  to  measure  performance  and  instruction  mixes  of  applications 
under  the  Monsoon  instruction  set  architecture  with  the  run-time  system  described  above. 
These  measurements  have  shown  that  Id  can  be  run  efficiently  on  Monsoon.  The  dynamic 
instruction  mixes  collected  while  running  Id  applications  on  Monsoon  have  shown  that  the 
architecture  is  well  matched  to  the  language,  and  not  too  much  overhead  is  introduced  to 
support  instruction-level  parallelism.  However,  some  of  the  architectural  restrictions  imposed 
by  the  implementation  of  the  wire-wrap  processor,  such  as  the  restriction  of  one  explicit 
destination  per  instruction  and  10-bit  offsets  for  addressing  of  frame  locations,  have  caused 
excessive  difficulty  in  compilation  and  have  introduced  excessive  overhead  in  the  object  code. 
The  processor  has  been  redesigned  to  mitigate  these  problems. 

These  performance  measurements  have  also  shown  that  better  algorithms  must  be  used 
for  storage  management  so  that  concurrent  invocations  of  the  storage  manager  will  not 
cause  excessive  sequentialization  of  the  program.  This  sequentialization  will  be  magnified  on 
multiple-processor  Monsoon  systems  unless  better  algorithms  are  used.  Young  and  Armando 
Fox  have  designed  and  implemented  a  storage  manager  that  uses  multiple  local  fr~e  lists  and 
local  heaps  to  allow  concurrent  access  to  the  heap. 

4.9.2  The  Monsoon  Processing  Element  (Second  Generation) 

We  have  made  significant  progress  in  the  design  and  verification  of  a  second  generation 
Monsoon  processing  element  (PE).  This  board-level  design  was  transferred  to  the  Motorola 
Microcomputer  Division,  where  it  was  successfully  layed-out  and  routed.  It  is  now  in  the 
process  of  being  manufactured.  We  also  designed  and  fabricated  two  application-specific 
integrated  circuits  (ASICs):  a  byte-slice  of  the  datapath  and  tag/pointer  ALU.  Greg  Pa- 
padopoulos  was  responsible  for  the  overall  PE  and  ASIC  architectures.  Jack  Costanza  and 
Ralph  Tiberio  executed  the  detailed  designs  and  simulations. 

The  original  Monsoon  wire- wrap  prototype  processor  was  made  operational  in  1988,  execut¬ 
ing  its  first  compiled  Id  program  in  October  of  that  year.  The  new  PE  is  largely  compatible 
with  the  original  prototype,  employing  an  eight-stage  pipeline,  64-bit  datapaths  (plus  eight 
bits  of  type),  and  32-bit  instructions.  The  new  PE  differs  in  several  important  respects: 

•  Interprocessor  Network:  The  first  prototype  could  operate  only  as  a  uniprocessor 
because  it  lacked  an  interprocessor  network.  The  new  PE  integrates  the  network  into 
the  processor  by  employing  an  on  board  PaRC  and  Datalink  chips,  as  well  as  input 
and  output  FIFO  buffers. 

•  VME  Interface:  The  new  PE  is  hosted  on  a  9U  form-factor  VME  bus.  The  VME 
bus  interface  implements  diagnostic  functions  (access  to  processor  scan  state,  single 
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stepping,  breakpointing),  VME  and  PE  interrupts,  and  high  speed  I/O  through  a  dual 
ported  frame  store. 

•  Exception  Handlers:  The  new  PE  now  provides  a  way  to  efficiently  transfer  control 
to  state-preserving  exception  handlers.  Exceptions  can  be  induced  unconditionally 
(an  SVC),  elicited  by  a  standard  ALU  conditions  (like  overflow)  or  by  operand  type 
inconsistencies.  In  fact,  we  will  use  the  SVC  mechanism  to  provide  dynamic  linking  to 
the  resource  management  system,  including  frame  and  heap  allocation/deallocation. 

•  Temporary  Registers:  The  new  PE  permits  state  to  be  communicated  between 
an  instruction  and  its  successor  through  a  small  set  of  temporary  registers  associated 
with  the  first  stage  of  the  ALU.  There  are  eights  sets  of  temporaries  with  three  72- 
bit  registers  per  set,  one  set  for  each  “logical  thread”  being  interleaved  by  the  eight 
stage  pipeline.  It  is  expected  that  temporaries  will  measurably  improve  the  dynamic 
efficiency  of  compiled  code. 

Experience  generating  code  for  the  wire-wrap  prototype  has  suggested  a  number  of  minor 
enhancements  to  the  processor  datapath.  For  example,  it  is  now  possible  to  use  the  current 
tag  as  one  of  the  arguments  to  the  ALU — optimizing  procedure  linkage  and  case  statements. 
We  have  also  attempted  to  make  the  design  more  manufacturable  by  providing  complete  scan 
coverage  of  all  internal  processor  state  and  parity  protection  for  instruction  memory,  frame 
store,  and  the  token  queues.  Figure  4.1  gives  the  block  diagram  of  the  processing  element 
datapath.  The  processor  is  designed  to  run  on  a  100  nanosecond  cycle  time,  yielding  a  sus¬ 
tained  processing  rate  of  10  million  tokens  per  second.  Compiled  code  presently  exhibits 
a  dynamic  average  of  1.4  tokens  per  instruction.  Thus,  the  processor  pipe  should  deliver 
approximately  8  million  instructions  per  second,  any  fraction  of  which  may  be  floating  point 
operations.  The  first  set  of  processors  will  be  equipped  with  256K  words  (32  bits)  of  in¬ 
struction  memory,  256K  words  (72  bits)  of  frame  store  memory  and  64K  words  (144  bits) 
of  token  queue  memory.  Both  the  instruction  and  frame  store  memories  are  upgradable  to 
lMWord. 

The  processing  element  detailed  design  was  captured  and  extensively  simulated  on  our 
Apollo/Mentor  Graphics  design  systems.  Both  arrays  were  implemented  in  LSI  Logic’s  10K 
series  of  1.5  micron  channeless  arrays,  and  packaged  in  144  pin  fine-pitch  quad  flat  packs. 
The  DATAPATH  array  comprises  a  little  over  10,000  gates  and  implements  a  9-bit  slice  of 
the  datapath  pipeline  registers,  temporaries,  breakpoint  registers,  form  token  multiplexors, 
VME  interface,  and  all  static  RAM  parity  generation  and  check.  Each  processor  uses  eight 
DATAPATH  options.  The  PIU  gate  array  is  an  ALU  function  unit  specialized  for  tag  and 
pointer  manipulation.  The  PIU  array  comprises  approximately  6,500  gates.  Both  arrays 
have  been  successfully  prototyped  in  volumes  sufficient  for  an  initial  build  of  five  processing 
elements. 

Design  verification  emphasized  full-board  gate-level  simulation  and  timing  verification,  in¬ 
cluding  all  of  the  gate  arrays.  Simulation  tests  generally  took  the  form  of  small  handcoded 
dataflow  graphs  designed  to  test  various  aspects  of  the  instruction  processing  mechanism. 
One  or  more  initial  tokens  would  be  introduced  into  the  pipeline,  and  then  the  simulation 
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Figure  4.1:  Block  Diagram  of  Monsoon  Processing  Element  Datapath 
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was  allowed  to  “free-run”  using  the  processing  of  the  tokens  themselves  to  generate  the  var¬ 
ious  test  vectors.  The  simulated  design  was  transferred  to  Motorola  in  the  Fall  of  1989.  A 
12-layer  surface-mount  circuit  board  was  successfully  routed  by  Motorola  in  February  1990, 
and  assembly  began  on  the  first  processor  prototypes  in  May  1990.  Simulation  and  verifica¬ 
tion  of  the  processor  still  remains  an  area  of  intense  activity.  We  have  generated  a  set  of  code 
fragments  with  reference  timing  traces  to  aid  in  the  hardware  debugging  and  certification 
process. 

4.9.3  The  Interconnection  Network  for  Monsoon 

Andy  Boughton,  Christopher  Joerg,  Juan  Ferrera,  and  Robert  Lustberg  have  continued 
development  of  the  network  for  Monsoon.  This  network  is  packet- switched  and  supports 
a  bandwidth  of  800  MBits/sec/port.  The  two  primary  components  of  the  network  are  the 
Packet-switched  Routing  Chip  (PaRC)  and  the  Data  Link  Chip  (DLC).  These  components 
were  discussed  in  some  detail  in  last  year’s  progress  report. 

PaRC  is  a  CMOS  gate  array  designed  by  Chris  Joerg  that  forms  the  basis  of  the  Monsoon 
network.  PaRC  has  4  input  ports  and  4  output  ports,  each  of  which  is  16  bits  wide  and 
has  a  maximum  throughput  of  800  MBits  per  second.  Each  input  port  has  4  buffers,  each 
of  which  can  hold  one  packet.  PaRC  has  a  sophisticated  buffering  and  scheduling  strategy 
which  will  allow  an  output  port  to  transmit  a  packet  whenever  possible.  PaRC  uses  a 
CRC  code  to  detect  errors  on  received  packets.  PaRC  also  allows  a  processor  to  get  a  fast 
acknowledgment  that  its  message  has  been  received.  The  mechanism  for  this  is  able  to 
provide  the  acknowledgment  without  further  burdening  the  network. 

Chris  Joerg  has  finished  the  design  of  PaRC  and  the  generation  of  a  complete  set  of  test 
vectors.  PaRC  has  33,000  used  gates  and  is  capable  of  operating  at  50  MHz.  It  has  a  low 
latency  (100ns  in  Lght  traffic),  while  making  effective  use  of  its  bandwidth  (90%  utilization 
in  heavy  trallic).  The  set  of  test  vectors  allow,,  the  vendor  to  check  for  defects  on  newly 
fabricated  chips. 

PaRC  has  been  fabricated  in  LSI  Logic’s  1.5  micron  compacted  array  series,  and  working 
chips  have  been  received.  One  of  these  chips  has  been  placed  on  a  simple  test  fixture 
constructed  by  Juan  Ferrera.  This  setup  was  used  to  verify  the  timing  of  selected  output 
signals. 

The  DLC  is  an  ECL  gate  array  that  interfaces  16-bit  wide  PaRC  ports  to  4-bit  wide  inter¬ 
board  cables.  Each  DLC  contains  one  data  link  transmitter  and  one  data  link  receiver.  Each 
of  the  4  bits  of  the  interboard  data  path  is  differentially  driven  at  200  MBit/sec. 

Andy  Boughton  has  completed  the  design  of  DLC  and  the  generation  of  a  set  of  test  vectors. 
DLC  has  been  fabricated  in  Motorola’s  Mosaic  II  ECL  array  series  and  working  chips  have 
been  received. 

Juan  Ferrera  and  Robert  Lustberg  have  designed  and  constructed  a  test  fixture  for  the  DLC 
chip.  The  fixture  uses  two  DLCs  to  transmit  test  patterns  over  a  40’  datalink  cable.  Tests 
with  this  fixture  indicate  that  DLCs  can  be  used  to  reliably  interconnect  network  boards  in 
different  racks. 

The  first  use  of  PaRC  and  DLC  will  be  in  the  initial  version  of  the  Monsoon  processor  board. 
These  boards  are  currently  under  construction.  Each  of  these  boards  uses  one  PaRC  chip 
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and  one  DLC  chip.  These  boards  will  allow  PaRC  and  DLC  to  be  more  thoroughly  tested 
in  an  operational  environment. 

Our  industrial  partner,  Motorola,  designed  4  input/4  output  network  boards  using  a  PaRC 
and  4  DLCs.  These  boards  which  should  be  available  early  in  1991,  will  be  used  in  the 
construction  of  16-node  Monsoon  systems. 

4.9.4  The  I-structure  Memory  Board 

Ken  Steele  completed  his  Master’s  thesis  entitled  Implementation  of  an  I-Structure  Memory 
Controller  in  February  1990.  The  design  was  kept  simple  to  reduce  cost  and  design  time. 
In  particular,  the  board  does  not  perform  local  management  of  deferred  continuation  lists. 
Instead,  the  compiler  allocates  storage  in  the  frame  for  one  cell  of  a  deferred  list.  When  an 
I-fetch  instruction  is  executed  against  an  empty  I-structure  location,  the  I-structure  board 
automatically  responds  with  a  token  whose  continuation  is  a  small  modification  of  the  normal 
return  continuation.  The  effect  cf  this  modified  continuation  is  to  thread  the  deferred  list 
through  all  the  frames  from  which  the  deferred  I-fetches  were  issued. 

Motorola  Microcomputer  Division  in  Tempe,  Arizona  is  building  an  I-structure  controller 
based  on  Steele’s  thesis  design.  Fabrication  of  the  hardware  is  currently  underway. 

4.9.5  Caching  for  Monsoon 

This  past  year,  Derek  Chiou  has  been  investigating  the  possibility  of  caching  on  Monsoon. 
The  prototype  Monsoon  uses  static  RAM  for  all  of  its  memory  at  the  present  time.  It  would 
be  desirable  to  use  slower,  cheaper  dynamic  RAM  for  future  iterations  of  the  processor.  If 
the  processor  is  to  run  at  competitive  speeds,  using  slower  RAMs  will  require  caching  of  some 
sort.  Software  has  been  developed  that  will  produce  an  instruction  trace  from  either  MINT 
or  the  Monsoon  prototype.  Chiou  also  wrote  a  cache  simulator  which  takes  an  instruction 
trace  and  collects  cache  data.  Most  of  the  serious  data  collection  has  been  done  for  very  small 
caches — generally  around  32  words  of  fully  associative  cache.  Results  have  been  reasonably 
promising,  with  hit  rates  ranging  from  33%  to  84%.  These  results  are  very  preliminary, 
however. 

4.9.0  Completion  of  MINT,  a  Monsoon  Simulator 

Last  year  we  reported  the  construction  of  MINT  (Monsoon  Interpreter),  a  simulator  of  the 
Monsoon  instruction  set.  MINT  has  a  variety  of  uses:  debugging  microcode  for  Monsoon, 
development  and  testing  of  Monsoon  object  code  generators  in  the  compiler,  gathering  of 
more  detailed  statistics  than  Monsoon  itself  is  capable  of  gathering,  etc. 

This  year,  Andrew  Shaw  has  extended  MINT  to  simulate  multiple  Monsoon  processor  execu¬ 
tion.  The  extension  has  a  network  simulation  that  models  latency.  Hut.  not  contention.  It  is 
expected  t  hat  contention  will  not  be  a  significant  problem  as  the  peRormance  of  the  Monsoon 
network  is  very  high,  and  the  references  will  be  well  distributed,  as  data  is  interleaved  across 
processors.  A  multiple-processor  run  time  system  was  implemented  to  distribute  processes 
and  to  handle  resource  manager  requests.  Several  experiments  were  run  that  indicate  that 
Gita  simulation  was  indeed  an  accurate  predictor  of  performance  in  Monsoon.  In  addition, 
the  capacity  of  the  network  was  deemed  sufficient  to  handle  processors’  memory  requests 
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and  that  the  limiting  factor  in  parallel  execution  is  likely  to  be  serialization  induced  by  re¬ 
quests  to  the  resource  manager.  In  addition  to  multiple-processor  simulation,  an  I-structure 
board  simulation  was  added  to  MINT. 

4.10  Other  Activities 

4.10.1  Optimal  Interpreters  for  the  Lambda-Calculus 

Vinod  Kathail  completed  his  doctored  dissertation  [156]  in  which  he  developed  a  new  inter¬ 
preter  for  the  A-calculus  that  is  optimal  in  the  theoretical  sense  defined  by  J.-J.  Levy  [195], 
and  gave  proofs  of  its  correctness  and  optimality. 

The  interpreter  is  based  on  a  new  graph  representation  for  A-expressions  that  permits  sharing 
of  not  only  subexpressions  but  also  contexts,  i.e.,  parts  of  an  expression  that  are  not  complete 
subexpressions.  This  is  in  contrast  to  the  commonly  used  representations  of  expressions 
which  permit  sharing  of  only  subexpressions. 

The  interpreter  is  presented  as  a  graph  reduction  system  along  with  a  normalizing  strategy 
for  applying  the  reduction  rules.  The  set  of  rules  includes  a  graph  version  of  the  /?-rule 
of  the  A-calculus  as  well  as  certain  other  rules,  some  of  which  are  similar  to  the  rules  for 
handling  environments  in  an  environment- based  interpreter  for  the  A-calculus.  Some  of  the 
nice  features  of  the  interpreter  are  as  follows:  first,  all  the  reduction  rules  are  local  constant¬ 
time  operations  on  graphs;  second,  the  reduction  strategy  for  applying  the  rules  is  quite 
simple;  and  finally,  the  input  to  the  interpreter  as  well  as  the  output  of  the  interpreter  are 
“clean”  representations  of  A-terms.  They  do  not  contain  various  new  types  of  nodes  used  by 
the  interpreter.  A  version  of  the  interpreter  has  been  implemented  on  Lisp  Machines. 

To  prove  the  correctness  of  the  interpreter,  the  thesis  develops  two  calculi,  called  A fc  calculus 
and  Xf  calculus.  A fc  calculus  is  essentially  the  term  version  of  the  graph  reduction  system 
underlying  the  interpreter.  Xf  calculus  is  obtained  from  A/c  calculus  by  removing  certain 
types  of  terms  and  reduction  rules  that  are  not  very  useful  for  terms.  The  thesis  shows 
the  correspondence  between  the  graph  reduction  system  underlying  the  interpreter  and  A/c 
calculus,  as  well  as  correspondence  between  the  two  calculii  and  De  Bruijn  notation  [87]. 
Although  Xf  calculus  was  motivated  by  the  interpreter,  it  may  be  of  general  interest  because 
of  the  way  it  simulates  changing  of  De  Bruijn  numbers. 

The  thesis  also  strengthens  an  earlier  result  of  Barendregt,  et  al.,  that  states  that  if  A- 
expressions  are  represented  as  trees,  then  there  is  no  recursive  (one-step)  reduction  strategy 
that  is  optimal.  The  extension  proved  in  the  thesis  provides  some  justification  for  the  basic 
assumption  underlying  the  optimality  criterion,  i.e.,  the  number  of  /^-contractions  performed 
in  reducing  an  expression  is  a  good  measure  of  the  cost  of  reducing  the  expression. 

4.10.2  P-RISC 

Madhu  Sharma  is  investigating  the  design  of  a  processor  that  is  a  concrete  implementation  of 
the  P-RISC  architecture.  Most  multithreaded  architectures  incur  a  large  context-switching 
cost.  The  cost  may  be  incurred  either  in  hardware — when  register  space  is  provided  for 
a  large  number  of  contexts,  or  in  time — when  contexts  have  to  be  swapped  in  and  out  of 
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processor  register  files.  The  proposed  design  virtually  eliminates  both  elements  of  context¬ 
switching  cost.  It  caches  contexts  in  multiple  register  sets  in  the  processor,  but  manages 
to  mask  the  context  swapping  cost  using  an  additional  port  to  the  register  file  and  deep, 
deterministic,  instruction  lookahead  mechanism.  The  design  is  claimed  to  be  only  marginally 
more  expensive  than  commercial  RISC  processors  such  as  the  SPARC. 

A  detailed  simulator  for  the  architecture  has  been  developed.  Statistics  gathered  for  small 
handcoded  programs  run  on  the  simulator  indicate  that  the  architecture  does  manage  to 
mask  context-switching  cost  and  performs  well  under  low  or  high  parallelism. 

We  are  now  developing  compiling  techniques  for  the  architecture,  with  two  approaches  pur¬ 
sued.  The  first  is  a  dataflow- graph  driven  approach,  wherein  we  start  with  a  dataflow  graph, 
sequentialize  threads  of  the  computation  to  obtain  larger  threads  (whenever  there  is  no  gain 
in  executing  the  threads  in  parallel),  and  arrive  at  a  “control-flow  graph”,  which  is  translated 
into  P-RISC  code.  The  second  approach  is  the  conventional  “control-based”  approach  and 
will  be  used  for  compiling  imperative  languages. 

4.10.3  Compiling  Id  for  von  Neumann  Machines 

Bradley  Kuszmaul  has  been  working  on  retargeting  the  Id  compiler  for  stock  hardware,  such 
as  conventional  UNIX  workstations.  Starting  with  the  existing  dataflow  graphs  produced 
by  the  compiler,  he  translates  these  into  parallel  control-flow  graphs  based  on  the  P-RISC 
abstract  machine  model.  The  major  effort  is  then  in  analysis  and  transformations  on  the 
control-flow  graph,  including  strictness  analysis,  subscript  analysis,  identification  of  threads, 
transformations  to  lengthen  threads  and  reduce  synchronizations,  and  peephole  optimiza¬ 
tions.  Finally,  these  graphs  are  used  to  generate  object  code  in  the  T  language  (T  is  a 
dialect  of  Scheme).  The  existing  T  compiler  already  has  a  very  sophisticated  code  generator 
for  a  variety  of  stock  machines  (including  register  allocation,  closure  optimization,  etc). 

We  expect  to  release  a  version  of  Id  World  using  this  new  compiler  by  June  30,  1990.  This 
implementation  should  substantially  increase  the  availability  of  Id  World  to  researchers  who 
may  not  have  access  to  Lisp  machines  or  Monsoon  dataflow  machines.  It  should  shed  light 
on  the  differences  in  implementation  requirements  between  nonstrict,  lenient  languages  like 
Id,  and  nonstrict,  lazy  languages  like  Miranda. 

4.10.4  Parallel  Persistent  Languages 

Michael  Heytens  and  Rishiyur  Nikhil  have  made  significant  progress  in  their  project  to  design 
and  implement  a  parallel  persistent  language.  The  aim  is  to  produce  a  system  in  which  (a) 
the  the  user  can  declare,  create  and  manipulate  objects  of  arbitrary  structured  types;  and 
(b)  all  such  objects  are  automatically  persistent.  Such  a  system  can  be  viewed  as  a  synthesis 
of  programming  languages  and  databases. 

Because  of  the  rich  object  structure  of  the  language,  and  because  the  structure  can  change 
significantly  over  time,  high  performance  cannot  be  achieved  using  conventional  database 
methods  (detailed  planning  of  data  layouts  on  disks  and  scheduling  of  disk  activity).  Our 
approach  is  to  use  parallelism. 

We  have  designed  a  kernel  database  language  that  is  greatly  inspired  by  Id,  i.e.,  having 
fine-grained,  implicit  parallelism.  In  addition,  in  update  transactions,  each  field  can  only  be 
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redefined  once,  as  in  I-structures;  this  allows  update  transactions  also  to  be  run  in  parallel. 
We  have  implemented  a  compiler  that  translates  transactions  into  dataflow  graphs  and  then 
into  P-RISC  code  that  is  augmented  with  manager  calls  for  disk  I/O.  As  in  Id  and  Monsoon, 
another  objective  is  to  mask  disk  latencies  using  parallelism. 

We  have  designed  a  segmented,  paged,  distributed  virtual  heap  for  the  persistent  system. 
Pages  of  the  virtual  heap  reside  in  files  in  each  processing  element,  and  are  fetched  on 
demand  into  page  frames  in  each  processing  element.  The  protocols  for  page  faults  and 
flushing  on  transaction-commit  have  been  designed,  and  the  filespace  occupied  depends  only 
on  the  heapspace  in  use  (not  the  entire  virtual  heap  address  space).  The  files  are  partitioned 
across  processing  elements  and  are  partitioned  by  type,  allowing  fast  traversals  of  collections 
of  objects  of  a  given  type.  The  files  may  also  be  indexed,  allowing  fast  direct  access  to 
individual  objects. 

A  prototype  is  being  implemented,  consisting  of  an  ensemble  of  P-RISC  emulators  running 
on  a  network  of  Sun  workstations.  It  is  currently  running  a  subset  of  the  language,  and 
we  expect  to  support  the  full  language  by  June  30,  1990.  After  this,  we  plan  extensive 
evaluations,  running  numerous  published  and  new  benchmarks,  and  porting  the  system  to  a 
real  multicomputer  with  parallel  disks. 

4.10.5  Bachelor’s  Theses  and  UROP  Projects 

Armando  Fox  (supervised  by  Jonathan  Young)  investigated  possibilities  for  more  efficient 
run-time  storage  managers.  A  number  of  traditional  schemes  were  analyzed,  with  particular 
attention  to  projected  performance  and  synchronization  requirements  for  a  dataflow  archi¬ 
tecture.  A  prototype  of  a  storage  manager  was  implemented  which  addressed  the  problems 
of  global  resource  contention  and  long  critical  sections,  and  its  performance  was  compared 
to  the  existing  first-fit  manager.  Although  substantial  improvements  were  observed  in  a 
variety  of  cases,  overhead  associated  with  clearing  out  storage  for  reuse  still  accounted  for  a 
significant  fraction  of  the  deallocation  latency.  Increasing  the  efficiency  of  this  operation  and 
possibly  implementing  some  sort  of  garbage  collector  remain  topics  for  future  investigation. 

David  Plass  (supervised  by  Jonathan  Young)  implemented  a  parser  generator  which  produces 
LALR  shift-reduce  tables  in  the  dataflow  language  Id,  for  use  in  an  Id  parser.  The  algorithm 
employed  avoids  the  creation  of  the  LR(1)  kernels  by  calculating  LR(0)  kernels  and  later  adds 
LALR  lookahead  information.  In  addition,  a  general  purpose  parser  shell  was  implemented 
which  can  be  used  in  conjunction  with  output  by  a  compiler  to  parse  source  language  inputs. 
This  work  will  help  in  our  effort  to  write  the  Id  compiler  in  Id. 

Alejandro  Caro  (supervised  by  Jonathan  Young)  designed  and  implemented  a  symbolic  de¬ 
bugger  for  the  Id  programming  language  on  the  Monsoon  processor.  The  debugger  allows 
the  user  to  trace  function  calls  and  returns,  to  examine  local  variable  bindings  and  loop  vari¬ 
able  bindings,  and  to  examine  the  state  of  the  machine  in  detail.  Furthermore,  the  debugger 
allows  the  user  to  invoke  these  functions  at  the  source  code  level ,  relieving  the  user  from 
having  to  learn  the  intricacies  of  the  processor  and  compilation  schemes. 

Glen  Adams,  (supervised  by  Greg  Papadopoulos),  extended  the  Lisp  MINT  simulator  to 
model  I-structure  memory.  An  I-structure  module  simulator  is  capable  of  handling  I-store, 
I-fetch,  I-put,  and  I-take  requests,  as  well  as  ordinary  reads  and  writes.  This  was  interfaced 
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to  the  MINT  system  so  that  it  is  suitably  initialized  and  driven  by  the  queuing  system.  The 
design  is  extensible  to  model  multiple  I-structure  units. 

Mike  Flaster  (supervised  by  Rishiyur  Nikhil)  implemented  a  compiler  for  a  small,  nonstrict 
language  that  uses  dependence  analysis  to  convert  a  nonstrict  program  into  a  strict  one. 
First,  the  compiler  performs  conventional  def-use  analysis,  as  well  as  subscript  analysis  for 
arrays  in  loops,  using  the  Banerjee- Wolfe  and  GCD  tests,  augmented  with  some  symbolic 
subscript-expression  reduction.  Then,  the  compiler  performs  loop-splitting,  loop-reversal, 
loop-distribution,  scalar  expansion,  induction- variable  analysis,  etc.  to  ensure  that  all  de¬ 
pendencies  are  forward  dependencies.  The  program  can  now  be  run  with  the  parallelism 
that  is  best  suited  to  the  resources  of  a  given  machine. 

UROP  student  Doug  Stetson  (supervised  by  Jonathan  Young)  implemented  a  compiler  which 
translates  a  small  subset  of  C  into  an  Id  program  graph,  the  intermediate  language  of  the  Id 
compiler.  The  subset  included  assignments,  conditionals  and  loops,  but  not  procedure  calls 
or  pointers.  The  sequential  semantics  of  C  was  enforced  by  artificial  dataflow  edges. 
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5.1  Introduction 


The  1989-90  academic  year  saw  the  birth  of  the  Computer  Architecture  Group,  an  inter- 
laboratory  conglomerate  whose  research  projects  and  goals  subsume  those  of  several  pre¬ 
viously  independent  single-faculty  groups.  The  CAG  is  still  in  its  formative  stages,  and 
continues  several  projects  previously  reported  elsewhere  (including  those  of  the  former  Real 
Time  Systems  Group  and  the  Alewife  project).  Our  goal  over  the  next  several  years  is 
to  combine  gratuitously  different  research  projects  to  yield  a  coherent  overall  approach  to 
architecture-related  research. 

Following  sections  report  progress  in  several  LCS  projects  which  have  continued  in  the  new 
CAG  venue. 

5.2  Alewife 

Agarwal,  Kranz,  and  others  continued  their  work  on  the  Alewife  machine  project.  The  goal  of 
the  Alewife  experiment  is  to  demonstrate  that  a  parallel  computer  system  can  be  made  both 
scalable  and  easily  programmable.  Scalability  will  be  achieved  through  an  architecture  that 
allows  the  exploitation  of  locality.  That  is,  for  programs  that  display  communication  locality, 
scalable  machines  can  offer  proportionally  better  performance  with  more  processing  nodes. 
A  program  running  on  a  parallel  machine  displays  communication  locality  if  the  probability 
of  communication  with  various  nodes  decreases  with  physical  distance.  Programmability  will 
be  achieved  through  automatic  management  of  locality  by  a  combination  of  hardware  and 
software  mechanisms. 

As  the  experimental  vehicle  for  this  research,  we  are  designing  and  implementing  the  parallel 
computer  system,  Alewife.  Users  will  be  able  to  write  a  large  class  of  parallel  applications  on 
this  machine  without  worrying  about  partitioning  and  placement  of  data  or  processes,  while 
achieving  speedups  comparable  to  those  available  from  carefully  handcrafted  programs. 

The  opposing  goals  of  scalability  and  programmability  are  hard  to  achieve  simultaneously. 
In  conventional  shared  memory  machines,  all  memory  accesses  incur  the  same  cost.  Thus, 
they  can  be  programmed  relatively  easily,  but  bus  or  network  bandwidth  limitations  hamper 
their  scalability.  Conventional  message-passing  multicomputers  scale  because  they  allow  the 
exploitation  of  locality  through  the  use  of  distributed  local  memory  and  direct  networks,  but 
the  user  has  to  explicitly  partition  and  place  data  and  processes,  which  makes  it  hard  to 
program  such  machines.  Perhaps  it  is  this  problem  that  prompted  the  following  remark  by 
Ken  Kennedy  in  Computing  Research  News:  “Contemporary  parallel  machines  are  architec¬ 
turally  diverse  and  have  reasonably  primitive  programming  systems  that  expose  the  details 
of  the  machine  architecture  to  the  user.” 

The  Alewife  machine  has  a  distributed  shared  memory  organization  with  a  cost-effective, 
mesh  network.  Such  an  architecture  allows  the  exploitation  of  locality,  and  is  scalable.  In 
our  system,  the  hardware  and  software  systems  share  the  responsibility  of  enhancing  locality 
to  reduce  bandwidth  demands  on  the  network,  resulting  in  a  system  that  is  also  easy  to 
program. 
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Automatic  locality  management  in  Alewife  is  achieved  by  reducing  latency  of  memory  re¬ 
quests  when  possible,  and  by  latency  tolerance  otherwise. 

Several  components  in  the  Alewife  system  cooperate  in  automatic  minimization  of  latency. 
Hardware-managed  distributed  caches  significantly  reduce  the  frequency  of  remote  commu¬ 
nications  by  automatically  copying  frequently- used  data  locally.  However,  maintaining  cache 
coherence  in  scalable  machines  is  a  hard  problem.  We  developed  a  novel  method  of  enforc¬ 
ing  distributed  cache  coherence  called  LimitLESS  directories  (Limited  directories  Locally 
Extended  through  Software  Support).  The  LimitLESS  directory  works  by  maintaining  a 
constant  (two  to  four)  number  of  pointers  to  shared  copies  of  data  in  hardware,  while  trap¬ 
ping  the  processor  when  a  directory  overflow  occurs  and  extending  the  specific  directory 
entry  into  local  memory.  This  simple  scheme  performs  as  well  as  the  unscalable  full  map 
scheme,  but  its  hardware  overhead  remains  constant  with  machine  size.  The  LimitLESS 
directory  works  well  because  the  fixed  set  of  pointers  suffices  for  the  common  case  of  limited 
sharing,  and  because  our  processor’s  rapid  context  switching  allows  efficient  trap  handling 
for  the  rare  cases. 

The  software  run-time  system  of  Alewife  allows  process  and  data  partitioning,  placement,  and 
migration  for  improving  locality.  A  new  scheme,  called  lazy  futures,  partitions  tasks  at  run¬ 
time,  maximizing  locality  without  compromising  parallelism.  A  distributed  tree  scheduler 
dynamically  assigns  tasks  to  processors  trying  to  balance  the  performance  gains  due  to 
locality  with  the  need  for  load  balancing.  The  compiler  assists  by  clever  data  and  process 
decomposition  and  scheduling,  and  through  hints  to  the  run-time  system. 

When  the  system  cannot  obviate  a  remote  memory  request  and  is  forced  to  incur  the  latency 
of  the  communication  network,  the  Alewife  processors  attempt  to  tolerate  this  latency  by 
rapidly  scheduling  a  runnable  process  in  place  of  the  stalled  task.  Alewife  can  tolerate  syn¬ 
chronization  latencies  as  well  through  the  same  context  switching  mechanism.  We  designed 
a  new  processor  architecture,  called  APRIL,  that  can  rapidly  switch  between  processes.  (In 
our  first-round  implementation  the  switch  will  take  11  cycles.).  The  fast  switching  is  achieved 
by  caching  a  few  (four  in  our  implementation)  process  context  frames  on  the  processor  to 
eliminate  the  overhead  of  unloading  and  restoring  the  process  registers. 

To  assess  the  extent  to  which  scalability  and  programmability  can  be  achieved  through  auto¬ 
matic  management  of  locality,  we  are  building  the  Alewife  machine  and  its  software  system 
and  implementing  several  large  symbolic  and  numeric  applications.  Alewife’s  rich  perfor¬ 
mance  instrumentation  will  provide  an  accurate  evaluation  of  our  ideas  with  real  parallel 
applications  on  the  initial  64  node  machine,  and  experimentally  observed  communication 
locality  profiles  will  also  help  us  to  forecast  the  extent  to  which  such  schemes  scale  to  much 
larger  systems. 

Some  of  the  salient  developments  in  the  Alewife  machine  project  over  the  last  year  are 
outlined  below. 

5.2.1  The  Alewife  Machine  Hardware  Organization 

The  architecture  of  the  Alewife  machine  has  been  defined  and  we  obtained  detailed  perfor¬ 
mance  estimates  for  a  variety  of  applications  through  simulations.  Figure  5.1  depicts  the 
Alewife  machine  as  a  mesh  connection  of  a  set  of  processing  nodes.  (Figure  5.3  depicts  the 


69 


Computer  Architecture  Group 


_ Figure  5.1 :  The  Alewife  Machine  Hardware  Organization _ 

structure  of  the  distributed  LimitLESS  directory.)  Each  Alewife  node  consists  of  a  processor 
called  APRIL,  a  cache,  a  portion  of  globa’ly-shared  distributed  memory,  a  cache-memory- 
network  controller,  a  floating-point  coprocessor,  and  the  network  switch.  Our  current  pro¬ 
posal  is  to  build  a  modest  64-node  (8  x  8)  experimental  system.  Measurements  from  this 
machine  will  indicate  the  scalability  potential  of  our  ideas  for  a  future  larger-scale  effort 
based  on  a  three  dimensional  network  called  NuMesh  (described  in  a  later  section). 

5.2.2  ASIM:  A  Simulation  System  for  Alewife 

A  simulator  for  the  Alewife  machine  has  been  operational  since  June  1989.  The  simulator 
accurately  models  the  processor,  cache  and  memory,  and  the  interconnection  network.  A 
compiler  and  run-time  system  are  operational  and  produce  code  for  the  Alewife  processor. 
ASIM  simulates  roughly  10,000  processor  instructions  per  second  on  our  SPARCserver  330. 

ASIM  has  recently  been  augmented  with  many  new  features  such  as  a  floating-point  copro¬ 
cessor,  support  for  special  processor  mechanisms  including  remote  process  invocation,  and 
full-empty  bit  synchronization  with  support  for  arrays  of  full-empty  bit  data.  ASIM  also 
implements  other  directory  coherence  protocols  >  1  network  structures  to  enable  architec¬ 
tural  comparisons.  Several  parallel  applications  ha/e  been  written,  compiled,  and  run  on  the 
ASIM  simulator.  The  simulator  has  been  heavily  instrumented  and  yields  a  wide  range  of 
useful  statistics  including  parallelism  profiles,  communication  locality  histograms,  cache  and 
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_ _ _ Figure  5.2:  The  Alewife  Simulator  System 

network  statistics,  processor  utilization,  process  length  distributions,  run  lengths  between 
synchronizations,  etc.  These  execution  profiles  provide  feedback  to  the  programmer  and  help 
in  parallel  program  optimization.  The  structure  of  AS1M  is  depicted  in  Figure  5.2. 

5.2.3  Locality  Management  through  Caches 

We  investigated  the  use  of  caches  in  providing  efficient  coherent  shared  memory.  Caches 
copy  frequently-used  data  in  a  fast  local  memory  and  can  obviate  repeat  requests  to  remote 
memory,  f  urthermore,  their  operation  is  transparent  to  the  programmer.  David  Chaiken  has 
designed  the  cache  coherence  protocol  for  the  LimitLESS  directory  scheme.  Figure  5.3  depicts 
a  LimitLESS  directory  node  with  two  pointers  implemented  in  hardware.  The  overflow 
pointers  for  datum  X  are  stored  in  the  local  chunk  of  main  memory.  When  the  overflow 
pointers  must  be  accessed  the  controller  traps  the  processor,  and  the  processor  then  proceeds 
to  emulate  a  full-map  protocol. 

Chaiken  and  Kiyoshi  estimated  the  performance  of  the  LimitLESS  scheme  through  simula¬ 
tions  and  showed  that  LimitLESS  directories  perform  almost  as  well  as  the  non-scalable  full 
map  directories.  For  example,  for  weather  forecasting  code,  LimitLESS  was  only  about  6% 
worse  (with  a  50  cycle  processor  overhead  for  software  handling  of  a  remote  request  that 
overflowed  the  directory  and  trapped  the  processor)  than  the  full  map  protocol.  Perhaps 
more  importantly,  this  scheme  allows  us  to  experiment  with  the  required  amount  of  hardware 
support  for  cache  coherence. 
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_ Figure  5.3:  The  LimitLESS  Directory  Scheme  _ 

5.2.4  The  APRIL  Processor 

Processor  architectures  of  the  future  must  change  to  reflect  the  special  needs  of  multipro¬ 
cessors.  Processors  in  multiprocessing  environments  must  be  able  to  tolerate  long  latencies 
arising  from  synchronizations  and  remote  memory  operations  that  defy  locality  optimiza¬ 
tions.  Additionally,  LimitLESS  directories  and  other  functionality  emulated  through  trap 
software  require  efficient  trap  handling.  Lim,  Kubiatowicz,  and  others  have  designed  a  new 
processor  architecture  called  APRIL  that  meets  these  requirements.  APRIL  is  a  multi¬ 
threaded  VLSI  processor  with  high  single-thread  performance.  A  multithreaded  processor 
mitigates  the  negative  effects  of  long  communication  and  synchronization  delays  in  multi¬ 
processors  by  overlapping  these  delays  with  computation  from  other  processes.  High  single 
thread  performance  ensures  reasonable  behavior  when  the  application  lacks  parallelism. 

The  left  half  of  Figure  5.4  depicts  the  user-visible  processor  state  comprising  four  sets  of 
general  purpose  registers,  and  four  sets  of  Program  Counter  (PC)  chains  and  Processor 
State  Registers  (PSR).  The  PC  chain  represents  the  instruction  addresses  corresponding 
to  a  thread,  and  the  PSR  holds  various  pieces  of  process-specific  state.  Each  register  set, 
together  with  a  single  PC-chain  and  PSR,  is  conceptually  grouped  into  a  single  entity  called 
a  task  frame.  Four  such  task  frames  are  implemented  in  the  first  version  of  APRIL.  Task 
switching  happens  in  11  cycles  in  this  implementation.  Only  one  task  frame  is  active  at  a 
given  time  and  is  designated  by  a  current  frame  pointer  (FP).  All  register  accesses  are  made 
to  the  active  register  set  and  instructions  are  fetched  using  the  active  PC-chain. 

Kubiatowicz  developed  a  simple  memory  reference  based  interface  between  the  APRIL  pro¬ 
cessor  and  the  Alewife  cache/memory  controller.  Using  a  control  word  associated  with  each 
memory  reference,  various  types  of  synchronization  or  communication  types  are  synthesized 
by  the  controller.  This  interface  allows  a  simple  implementation  of  the  processor  and  the 
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_ Figure  5.4:  The  APRIL  Processor  Architecture _ 

controller  by  enabling  graceful  trapping  into  software  for  handling  complicated  rare  condi¬ 
tions  for  which  dedicated  hardware  cannot  be  justified.  The  interface  has  been  implemented 
in  the  Alewife  simulator. 

The  implementation  of  the  APRIL  processor  explores  a  new  form  of  collaboration  between 
industry  and  university.  We  are  modifying  LSI  Logic’s  SPARC  processor  to  avoid  dupli¬ 
cating  the  engineering  effort  in  designing  components  not  directly  relevant  to  our  research. 
Besides  other  obvious  benefits  such  as  industry  support,  this  approach  affords  the  oppor¬ 
tunity  to  ride  the  industrial  technology  curve  as  new  technologies  evolve,  and  allows  rapid 
transfer  of  new  ideas  from  university  to  industry,  and  the  ability  to  impact  evolving  indus¬ 
try  standards.  The  implementation,  called  MIT-SPARCLE,  is  a  joint  effort  with  LSI  Logic 
and  SUN  Microsystems.  LSI  Logic’s  standard  cell  and  gate  array  SPARCs  allow  high  level 
modifications  of  the  processor.  Currently  our  modifications  have  been  incorporated  into  the 
SPARC  gate-array  design  at  LSI  and  we  are  now  moving  into  the  design  verification  phase. 

We  also  evaluated  the  performance  of  multithreaded  processors  in  large  scale  multiprocessors 
using  an  analytical  model.  For  processor  parameters  derived  from  APRIL’S  SPARC-based 
implementation,  the  study  showed  that  multithreaded  processors  such  as  APRIL  can  achieve 
over  80%  efficiency  with  just  three  threads  with  a  10  cycle  memory  delay  in  a  3D  mesh 
network  with  base  average  latency  of  55  cycles. 

5.2.5  The  Interconnection  Network 

The  Alewife  controller  uses  a  simple  message-based  interface  with  the  network.  Various 
forms  of  shared  memory  coherence  models  are  maintained  by  the  controller  via  messages  to 
other  nodes.  We  plan  to  use  the  Caltech  Mesh  Routing  Chips  for  our  switch  nodes  in  the  two 
dimensional  64  processor  system.  A  much  more  aggressive  three  dimensional  interconnect, 
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called  NuMesh,  is  being  developed  in  the  Computer  Architecture  Group  in  a  collaborative 
effort  of  Steve  Ward,  Tom  Knight,  Bill  Dally,  and  Anant  Agarwal.  A  future  larger  scale 
implementation  of  Alewife  will  use  this  network. 

5.2.0  The  Run-time  System 

Nussbaum  is  working  on  a  run-time  system  that  optimizes  locality  of  memory  referencing 
through  clever  dynamic  scheduling  methods.  He  implemented  a  tree  scheduler  together  with 
lazy  task  creation  to  manage  parallelism  and  locality.  Early  results  obtained  on  our  simulator 
indicate  substantial  benefits  over  other  scheduling  methods  that  are  oblivious  of  the  physical 
nature  of  the  interconnection  network.  The  tree-scheduler  is  currently  being  augmented  with 
support  for  multithreaded  processor  operation. 

5.2.7  The  Compiler  System 

David  Kranz  has  retargeted  the  Orbit  optimizing  compiler  for  the  APRIL  processor.  He  also 
implemented  a  novel  method  of  dynamic  process  partitioning  called  Lazy  Futures.  The  Lazy 
Futures  method  virtually  obviates  the  overhead  associated  with  task  creation  and  deletion 
when  tasks  run  on  the  same  processor  on  which  they  were  created.  For  example,  with  Lazy 
Futures,  the  sequential  version  of  an  application  runs  at  roughly  the  same  speed  as  a  single 
processor  execution  of  a  parallel  version  of  the  same  application. 

Maa  and  Johnson  have  been  working  on  compiler  methods  for  enhancing  locality  in  parallel 
programs  by  statically  partitioning  code  and  allocating  data  structures  based  on  minimiza¬ 
tion  of  nonlocal  memory  references.  In  Alewife,  the  task  of  the  compiler  is  easier  than  in 
other  contemporary  machines  for  several  reasons.  First,  we  assume  the  program  is  explicitly 
parallel  (by  using  program-level  constructs  such  as  futures).  Second,  the  compiler- based 
static  methods  can  be  integrated  with  our  sophisticated  run-time  mechanisms.  For  example, 
compilers  often  lack  accurate  execution  time  profiles  to  enable  a  good  static  schedule;  instead, 
our  compiler  can  provide  hints  to  the  run-time  system.  Finally,  Alewife’s  shared  memory 
organization,  dynamic  scheduling,  and  caches  largely  relieve  the  burden  of  micro- managing 
data  access  and  synchronization. 

Several  linguistic  extensions  to  our  Mul-T  programming  language  are  necessary  to  support 
static  scheduling  of  processes  and  remote  memory  allocation  (e.g.,  future-on  and  make- 
vector-on).  These  facilities  have  been  implemented  in  the  Orbit  compiler  and  the  ASIM 
simulator  by  Lim  and  Kranz. 

We  are  examining  several  large  data-parallel  applications  as  test  cases  for  our  compiler 
work.  The  lexical  matching  phase  of  the  SUMMIT  Speech  recognition  system  (developed 
by  the  Spoken  Language  Systems  Group)  has  been  parallelized  and  ported  to  our  system 
by  Johnson.  The  particle-in-cell  (PIC)  program,  written  originally  by  Olaf  Lubek,  has  also 
been  parallelized  and  ported  to  Mul-T  by  Maa  and  others. 

Structurally,  PIC  consists  of  steps  of  operations  which  produce  successive  intermediate  ma¬ 
trices.  The  operations  typically  read  from  a  few  locations  of  the  input  matrices  and  compute 
some  locations  for  the  output  matrices,  which  is  used  as  the  input  to  some  subsequent  oper¬ 
ations.  Fine-grain  architectures  synchronize  on  the  individual  elements  instead  of  on  entire 
matrices,  thereby  exposing  the  producer-consumer  parallelism.  Because  of  the  relatively 
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trivial  amount  of  work  involved  in  each  of  these  operations  (usually  a  sum  of  few  products), 
the  overhead  of  scheduling  and  synchronization  becomes  prohibitively  high.  To  reduce  these 
and  communication  (i.e.,  improving  locality)  costs,  it  will  be  beneficial  to  merge  some  of 
the  operations  together:  by  merging  an  upstream  operation  with  a  downstream  one,  we  can 
potentially  make  the  intermediate  matrix  element  local  and  eliminate  BOTH  the  communi¬ 
cation  and  synchronization  costs.  By  merging  two  peer  operations  which  share  some  of  their 
input  matrix  element  accesses,  we  again  reduce  the  amount  of  network  traffic.  The  merging 
increases  the  task  granularity,  thus  reducing  the  number  of  tasks  which  need  to  be  scheduled 
at  run-time  and  the  total  scheduling  overhead.  The  desired  granularity  can  be  adjusted  to 
match  the  system  size  (i.e.,  number  of  processors)  and  the  actual  cost  structure  of  the  run¬ 
time  and  hardware  to  insulate  the  programmer  from  details  of  the  run-time  environment. 
Moreover,  once  most  of  the  accesses  to  a  matrix  element  are  from  a  particular  task,  it  starts 
to  make  sense  allocating  the  corresponding  matrix  elements  and  tasks  to  the  same  processor 
node. 


5.2.8  Applications 

Johnson,  Wu,  Chan,  and  others  have  been  involved  in  developing  several  large  parallel  appli¬ 
cations,  including  Speech  Recognition,  Particle  in  Cell,  and  solutions  of  partial  differential 
equations.  Performance  results  on  real  parallel  applications  are  essential  to  effectively  eval¬ 
uate  our  ideas  on  automatic  locality  management. 

5.3  NuMesh 

Work  by  Agarwal,  Dally,  Knight,  Pratt,  Ward,  and  others  continues  the  NuMesh  project 
whose  conception  was  reported  last  year.  The  NuMesh  constitutes  the  first  CAG-wide  re¬ 
search  effort,  and  is  consequently  interesting  from  the  dual  standpoints  of  social  as  well  as 
computer  engineering. 

The  basis  for  the  NuMesh  is  the  recognition  that  extant  technologies  for  interconnecting 
digital  subsystems  provide  topological  flexibility  only  at  substantial  performance  cost.  Com¬ 
munication  schemes  designed  to  suppress  the  impact  of  physical  distances  among  modules 
from  the  design  sp,  ce  of  the  system  architect,  such  as  backplane  buses  and  hypercubes, 
typically  reduce  best-case  communication  times  to  accommodate  worst-case  distances  for 
reasons  of  simplicity.  At  the  lowest  level,  drive  levels  and  times  must  anticipate  the  worst- 
case  loading  allowed  by  their  interconnect  rules.  This  leads  to  a  fundamental  tension  between 
flexibility  as  to  physical  and  topological  layout  of  a  system’s  components  and  the  degree  of 
optimization  afforded  their  communication:  if  a  pin  on  a  chip  is  capable  of  driving  a  foot  of 
PC  trace  and  ten  inputs,  resource  (hardware,  time,  energy)  are  wasted  when  it  is  used  to 
drive  a  single  input  on  an  adjacent  chip. 

One  reaction  to  such  inefficiency  is  to  custom-engineer  the  drivers  and  receivers  on  each  signal 
line  to  reflect  the  physical  properties  of  the  associated  electrical  conductor.  This  approach  is 
followed  to  some  extent  in  very  high  performance  systems,  in  which  tedious  analog-domain 
transmission  line  analysis  techniques  are  applied  to  each  critical  signal  conductor  in  a  com¬ 
plicated  system.  While  some  such  detailed  optimization  can  be  justified  in  high-end  system 
designs,  it  necessarily  stops  far  short  of  the  ad  hoc  optimization,  say,  of  individual  output 
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Figure  5.5:  Schematic  View  of  a  NuMesh 


pads.  Moreover,  the  prohibitive  cost  of  this  methodology  rules  out  its  use  in  cost-effective 
systems. 

The  NuMesh  project  explores  an  alternative  reconciliation  of  communication  flexibility  with 
performance:  rather  than  forcing  a  universal  communication  technology  to  conform  to  ar¬ 
bitrary  network  topologies,  the  NuMesh  forces  higher-level  system  designs  to  conform  to  a 
rigidly  specified  universal  communication  topology. 

5.3.1  The  NuMesh  Abstraction 

The  major  goal  of  the  NuMesh  project  is  the  definition  of  and  support  for  a  standard  com¬ 
munication  and  interconnect  standard  for  modules  of  arbitrarily  complex  digital  systems. 
Abstractly,  a  NuMesh  consists  of  an  arbitrary  number  of  nodes  which  (partially)  populate  a 
three  dimensional  mesh,  as  diagrammed  schematically  in  Figure  5.5. 

Each  node  in  the  mesh  constitutes  a  digital  subsystem  which  communicates  directly  with 
each  of  its  six  orthogonal  neighbors  via  a  number  of  dedicated  signal  lines.  Each  signal 
conductor  is  unidirectional,  having  exactly  one  receiver  and  one  driver  (on  adjacent  nodes). 
Moreover,  the  physical  characteristics  of  each  conductor  are  rigidly  fixed.  This  mechanical 
and  topological  rigidity  allows  bandwidth  and  latency  parameters  of  communication  between 
adjacent  nodes  to  be  highly  optimized;  our  goal  is  on  the  order  of  one  GHz  for  each  signal 
line  in  the  eventual  standard. 

Each  node  combines  a  common  communication  substrate  with  the  application-specific  logic 
it  adds  to  the  system.  The  communication  portion  of  a  node  includes  a  very  high  speed 
clocked  finite  state  machine  which  is  programmed  (in  RAM)  to  mediate  local  communica¬ 
tions,  as  well  as  drivers  and  receivers  and  data  paths  which  interconnect  them.  The  set  of 
Communication  FSMs — CFSMs-occupying  each  node  of  a  NuMesh  operate  synchronously  at 
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the  basic  communication  clock  rate  (eventually  1  GHz).  They  are  typically  pre-programmed 
to  follow  a  fixed,  systolic  communication  pattern  whose  detailed  choreography  is  produced 
by  whatever  compiler  or  CAD  tool  also  dictates  the  selection  and  layout  of  the  modules 
themselves. 

The  general  idea  is  that  each  module’s  communications  FSM  be  programmed  to  follow  a 
periodic  pattern  of  interactions  with  neighbors.  Although  the  interactions  may  vary  among 
processors,  the  periods  will  be  identical.  If  module  A  transfers  a  word  to  its  right-hand 
neighbor  B  on  clock  37  of  each  period,  then  A’s  FSM  will  be  programmed  to  drive  its  lines 
to  B  on  that  clock,  while  B  will  be  programmed  to  load  in  data  from  A.  By  appropriate  design 
of  transition  tables,  arbitrary  systolic  communication  patterns  may  be  implemented  among 
processors.  In  some  cases,  words  loaded  by  a  module  are  destined  to  be  read  subsequently  by 
that  modules  DSP;  in  other  cases,  they  are  routed  (typically  on  the  next  clock)  to  another 
neighbor  without  DSP  intervention  or  even  awareness. 


_ Figure  5.6:  NuMesh  Node _ 

Communication  in  a  NuMesh  thus  typically  follows  a  static,  precompiled,  systolic  pattern  in 
which  every  node  can  in  theory  send  data  to  each  of  its  neighbors  during  every  cycle  of  the 
communication  clock.  Certain  of  the  data  received  at  a  node  will  be  routed  by  that  node’s 
CFSM  to  the  processor,  or  other  application-specific  logic  housed  in  the  node.  However,  we 
expect  that  the  bulk  of  the  data  received  at  a  node  will  be  forwarded,  as  through  traffic, 
to  another  neighbor  during  a  subsequent  cycle  by  the  CFSM.  Our  goal  is  to  make  this 
forwarding  efficient,  allowing  remote  communications  to  be  implemented  as  multiple  store- 
and-forward  steps  with  performance  that  compares  favorably  with  more  conventional  remote 
interconnect  schemes  such  as  buses.  It  is  our  presumption  of  a  preponderance  of  remote 
traffic  that  motivates  our  design  goal  that  each  CFSM  support  an  aggregate  communication 
bandwidth  much  higher  than  that  required  by  the  processor  (or  other  application-specific 
logic)  supported  by  the  node  containing  it. 
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Figure  5.7:  Snapshot  of  NuMesh  Communications 


Figure  5.7  depicts  a  typical  pattern  of  communications  at  some  point  during  a  computa¬ 
tion  on  a  2D  NuMesh  configuration.  Note  that  communications  between  adjacent  nodes — 
indicated  by  short  arrows — take  place  during  the  single  clock  cycle  which  this  snapshot 
depicts.  Remote  communications,  depicted  by  longer  arrows,  require  several  cycles;  while 
the  first  step  along  each  such  path  is  taken  during  the  cycle  depicted,  successive  steps  will 
occupy  subsequent  cycles. 

Certain  algorithms  may  benefit  from  flow  control  and  other  synchronization  measures  in  their 
underlying  communications.  These  might  be  superimposed  on  the  primitive  (branch-free) 
communication  mechanism  by  software  convention,  allowing  certain  data  words  to  contain 
control  information.  Moreover,  CFSM  interpretation  of  selected  data  words  provides  a  basis 
for  dynamic  routing  of  selected  data  messages  by  the  CFSM  alone,  considerably  enhancing 
the  flexibility  of  the  NuMesh  communications  substrate.  The  approach  to  such  dynamics 
suggested  by  the  NuMesh  structure  involves  preallocating  (at  compile  time)  certain  cycles  of 
the  periodic  communication  schedule  to  dynamically  routed  packets,  identifying  at  that  time 
the  cycles  at  which  each  CFSM  will  interpret  incoming  data  as  a  potential  message  header. 
Such  data  will  dictate,  perhaps  after  a  several-clock  latency  (to  accommodate  decision  logic), 
the  routing  of  the  message  body  to  be  transferred  on  subsequent  clock  cycles.  Hardware 
support  for  such  CFSM  decisions  is  currently  a  subject  of  active  study. 

5.3.2  NuMesh  Prototype  Hardware 

During  the  1989-90  academic  year,  a  feasibility  study  for  NuMesh  ideas  was  started  with  the 
design  and  prototyping  of  primitive  NuMesh  nodes  built  using  off-the-shelf  parts.  Work  by 
Mackenzie,  Jenez,  Abdalla,  Olsen,  and  others  has  led  to  a  4-neighbor,  2D  NuMesh  node  fab¬ 
ricated  on  a  pair  of  stacked  printed  circuit  cards.  Each  node  carries  a  TI  TMS320C30  digital 
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signal  processor  chip  along  with  local  memory,  leading  to  an  aggregate  peak  performance 
of  33  MFlops/node.  The  nodes  are  designed  to  plug  together  to  form  desktop  prototype 
NuMesh  arrays,  providing  an  early  vehicle  for  software  development  and  exploration  of  can¬ 
didate  applications. 

Although  we  expect  these  prototype  nodes  to  be  useful  both  in  evaluation  of  NuMesh  ideas 
and  in  pursuit  of  practical  applications,  the  expedient  technology  forces  many  serious  com¬ 
promises.  In  addition  to  clear  size  and  cost  disadvantages,  the  prototype  nodes  are  clocked 
at  about  30  MHz  and  thus  impose  bandwidth  limitations  which  are  more  than  an  order  of 
magnitude  below  our  performance  goals.  Moreover,  their  design  supports  only  static  com¬ 
munication  models:  the  CFSM  is  not  equipped  to  interpret  data  and  make  routing  decisions 
without  processor  intervention.  We  expect  all  of  these  restrictions  to  be  relaxed  somewhat 
in  a  subsequent  prototype  exploiting  custom  silicon. 

Currently,  the  prototype  NuMesh  hardware  is  interfaced  to  a  NuBus-based  Macintosh  which 
provides  host  system  services  and  software  support;  interfaces  to  alternative  workstations 
are  contemplated  in  the  future.  The  nodes  may  be  bootstrapped  from  the  Macintosh,  using 
a  scheme  developed  by  Dujari,  using  only  the  communication  and  CFSM  logic  on  each  node. 
Thus,  a  NuMesh  configuration  can  be  mapped  and  explored  by  the  host  independently  of 
the  processors  on  each  node,  allowing  the  communication  substrate  to  be  configured  and 
diagnosed  as  a  separable  subsystem. 

Two  nodes  are  operational,  and  15  more  are  under  construction;  thus  we  anticipate  an  oper¬ 
ational  four-by-four  NuMesh  array  during  the  summer.  CD-quality  analog  I/O  is  provided 
by  an  interface  developed  by  Handley,  allowing  application  of  these  prototypes  to  speech 
recognition  and  related  signal  processing  applications. 

5.3.3  NuMesh  Prototype  Software 

Work  by  Fetterman,  Jenez,  Metcalfe,  and  Trowbridge  addresses  the  development  of  an  initial 
software  environment  for  our  prototype  NuMesh  nodes.  The  goal  is  to  translate  source  code 
from  a  static  block  diagram  language  (similar  to  Consort  and  related  real  time  languages) 
to  a  fully- configured  multiprocessor  NuMesh,  generating  in  the  process  both  DSP  code  and 
CFSM  programming. 

The  compilation  problem  is  complicated  considerably,  even  in  the  context  of  our  limited 
source  language,  by  the  need  to  make  compile-time  decisions  regarding  the  distribution  of 
subcomputations  among  nodes.  In  general,  such  decisions  cannot  be  made  at  a  machine- 
independent  level,  since  they  involve  reconciling  computation  times  with  communication 
overhead.  In  order  to  confine  processor  dependencies  to  an  isolated  code  generator  module, 
we  provide  for  dyadic  communications  between  machine-independent  compiler  modules  faced 
with  distribution  and  scheduling  decisions  and  the  processor-specific  code  generator  modules 
capable  of  estimating  (or  bounding)  run-times.  These  interactions  make  use  of  a  low  level 
processor-independent  intermediate  code  in  a  data  structure  elaborated  repeatedly  by  several 
disparate  compiler  modules. 

Trowbridge  completed  a  Macintosh-based  frontend  for  the  translator,  and  Fetterman  is  com¬ 
pleting  a  TMS320C30  code  generator  which  produces  both  optimized  pipeline  code  and  tim¬ 
ing  information  from  machine-independent  input  fragments.  Metcalfe  completed  a  UNIX- 
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based  simulator  which  allows  experiments  to  be  conducted  on  an  approximate  model  of 
NuMesh  computations. 

Our  goal  is  to  complete  this  initial  support  software  during  the  summer  and  demonstrate  it, 
using  our  prototype  NuMesh  nodes,  on  one  or  two  modest  but  representative  applications. 
Speech  recognition,  which  (via  Zue’s  group)  provided  the  initial  stimulus  for  the  NuMesh, 
remains  the  primary  source  for  candidate  application  code.  A  second  application  area  of 
interest  is  finite  element  modeling,  currently  being  explored  by  Arthur  Altman  who  is  visiting 
CAG  from  Texas  Instruments. 
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5.4  Publications 

[1]  A.  Agarwal.  Analysis  of  Cache  Performance  for  Operating  Systems  and  Multiprogram¬ 
ming.  Kluwer  Academic  Publishers,  1989. 

[2]  A.  Agarwal  and  M.  Cherian.  Adaptive  backoff  synchronization  techniques.  In  Proceed¬ 
ings  of  the  16  th  Annual  International  Symposium  on  Computer  Architecture ,  IEEE, 
June  1989. 

[3]  A.  Agarwal  and  A.  Gupta.  Temporal,  Processor,  and  Spatial  Locality  in  Multiprocessor 
Memory  References.  Plenum  Press,  1989.  S.  Tewksbury,  editor.  Also  appears  as  MIT 
VLSI  Memo,  1989. 

[4]  A.  Agarwal,  M.  Horowitz,  and  J.  Hennessy.  An  analytical  cache  model.  ACM  Trans¬ 
actions  on  Computer  Systems,  May  1989. 

[5]  A.  Agarwal  and  M.  Huffman.  Blocking:  exploiting  spatial  locality  for  trace  compaction. 
In  Proceedings  of  ACM  SIGMETRICS  1990 ,  May  1990. 

[6]  A.  Agarwal,  B-H.  Lim,  D.  Kranz,  and  J.  Kubiatowicz.  APRIL:  A  processor  architecture 
for  multiprocessing.  In  Proceedings  of  the  17  th  Annual  International  Symposium  on 
Computer  Architecture,  June  1990. 

[7]  A.  Agarwal.  Limits  to  Network  Performance.  MIT  Laboratory  for  Computer  Science, 
November  1989.  Also  MIT  VLSI  Memo  1989.  Submitted  for  publication. 

[8]  A.  Agarwal.  A  Locality -based  Multiprocessor  Cache  Interference  Model.  Submitted  for 
publication.  Also  MIT  VLSI  Memo  1989. 

[9]  A.  Agarwal.  Performance  Tradeoffs  in  Multithreaded  Processors.  VLSI  Memo  89-566, 
MIT  Laboratory  for  Computer  Science,  September  1989.  Submitted  for  publication. 

[10]  D.  Chaiken,  C.  Fields,  K.  Kurihara,  and  A.  Agarwal.  Directory-based  cache- coherence 
in  large-scale  multiprocessors.  IEEE  Computer ,  June  1990. 

[11]  D.  Kranz,  R.  Halstead,  and  E.  Mohr.  Mul-T:  a  high-performance  parallel  Lisp.  In 
Proceedings  of  SIGPLAN  ’89,  Symposium  on  Programming  Languages  Design  and  Im¬ 
plementation ,  June  1989. 

[12]  E.  Mohr,  D.  A.  Kranz,  and  R.  H.  Halstead.  Lazy  task  creation:  a  technique  for 
increasing  the  granularity  of  parallel  tasks.  In  Proceedings  of  Symposium  on  Lisp  and 
Functional  Programming ,  June  1990.  To  appear. 

[13]  D.  Nussbaum,  I.  Vuong,  and  A.  Agarwal.  Modeling  a  circuit-switched  multiprocessor 
interconnect.  In  Proceedings  of  ACM  SIGMETRICS  1990 ,  May  1990. 

[14]  S.  Ward  and  R.  Halstead.  Computation  Structures.  MIT  Press,  1989. 
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Theses  Completed 

[1]  K.  Abdalla.  A  TMS320C30-based  Digital  Signal  Processing  Board  for  a  Prototype 
NuMesh  Module.  Master’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Com¬ 
puter  Science,  June  1990. 

[2]  C.  Barclay.  Parallel  Programming  Language  Constructs  for  L.  Master’s  thesis,  MIT 
Department  of  Electrical  Engineering  and  Computer  Science,  May  1990. 

[3]  K.  Ishii.  Caching  and  Memory  Reference  Patterns  in  Parallel  Lisp  Applications.  Mas¬ 
ter’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  June 
1990. 

[4]  T.  King.  Test  of  Metastability  in  Synchronizer  Circuits.  Master’s  thesis,  MIT  Depart¬ 
ment  of  Electrical  Engineering  and  Computer  Science,  September  1989. 

[5]  J.  Morrison.  Scalable  Multiprocessor  Architecture  Using  Cartesian  Network-relative 
Addressing.  Master’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer 
Science,  September  1989. 

[6]  T.  Nguyen.  Analysis  of  Spatial-locality-based  Trace  Compaction  Methods.  Bachelor’s 
thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  June  1990. 

[7]  N.  Osgood.  Automatic  and  Near-exhaustive  Derivation  of  Machine-dependent  Opti¬ 
mizations.  Bachelor’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer 
Science. 

[8]  M.  Powell.  A  Simulator  for  Multiprocessor  Implementations  of  L.  Master’s  thesis,  MIT 
Department  of  Electrical  Engineering  and  Computer  Science. 

[9]  I.  Shen.  The  Utility  of  Block  Transfer  Mechanisms  in  the  Alewife  Multiprocessor. 
Bachelor’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science, 
June  1990. 

[10]  S.  Smoot.  JUNE-BUG:  A  Parallel  Debugger  for  the  Alewife  Machine.  Bachelor’s  thesis, 
MIT  Department  of  Electrical  Engineering  and  Computer  Science,  June  1990. 

[11]  S.  Trowbridge.  A  Programming  Environment  for  the  NuMesh  Compute.  .  Master’s 
thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  May  1990. 

[12]  J.  Vidal.  Parallelizing  Compiler  for  Matrix  Expressions.  Bachelor’s  thesis,  MIT  De¬ 
partment  of  Electrical  Engineering  and  Computer  Science,  May  1990. 

Theses  in  Progress 

[1]  A.  Ayers.  PhD  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Sci¬ 
ence,  expected  September  1991. 
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[2]  D.  Chaiken.  Cache  Coherence  Protocols  for  Large-scale  Multiprocessors.  Master’s 
thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  expected 
October  1990. 

[3]  M.  Cherian.  A  Study  of  Backoff  Barrier  Synchronization.  Master’s  thesis,  MIT 
Department  of  Electrical  Engineering  and  Computer  Science,  June  1989. 

[4]  M.  Fetterman.  Master’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Com¬ 
puter  Science,  expected  September  1990. 

[5]  K.  Kurihara.  Performance  Evaluation  of  Large-scale  Cache- coherent  Multiprocessors. 
Master’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  ex¬ 
pected  July  1990. 

[6]  B-H.  Lim.  Waiting  Algorithms  for  Synchronization  in  Large-scale  Multiprocessors.  Mas¬ 
ter’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  expected 
October  1990. 

[7]  G.  Maa.  PhD  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science, 
expected  December  1991. 

[8]  D.  Nussbaum.  Run-time  Locality  Enhancement  on  ScalabL  Multiprocessors.  PhD 
thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  expected 
June  1991. 

[9]  C.  Selvidge.  Compile-time  Latency  Management:  Exploiting  Program  Predictability 
to  Achieve  Memory  Latency  Tolerance.  PhD  thesis,  MIT  Department  of  Electrical 
Engineering  and  Computer  Science,  expected  June  1990. 

Talks 

[1]  A.  Agarwal.  The  MIT  Alewife  machine.  Lecture  given  at  UC  Berkeley,  Stanford 
University,  DEC  Systems  Research  Center,  Distinguished  Lecture  Series,  University  of 
Rhode  Island,  June  1989. 

[2]  A.  Agarwal.  Tne  MIT  Alewife  machine:  exploiting  locality  in  a  large-scale  multipro¬ 
cessor.  Lecture  given  at  Encore  Computer,  Distinguished  Lecture  Series,  April  1990. 

[3]  A.  Agarwal.  Exploiting  locality  for  scalability.  Lecture  given  at  IEEE  Workshop  on 
Interconnections  within  Digital  Systems,  Santa  Fe,  NM,  May  1990. 

[4]  A.  Agarwal.  The  L  word.  Lecture  given  at  the  MIT  Laboratory  for  Computer  Science, 
Annual  Meeting,  Harwichport,  MA,  June  1989. 

[5]  A.  Agarwal  and  B-H.  Lim.  Exploiting  locality  for  scalability  in  the  MIT  Alewife 
machine.  Lecture  given  at  the  Workshop  on  Shared  Memory  Multiprocessors,  Seattle, 
WA,  May  1990. 

[6]  A.  Agarwal.  Multiprocessor  address  tracing:  the  agony  and  the  ecstasy.  Lecture  given 
at  the  Workshop  on  Multiprocessor  Performance  Evaluation,  Seattle,  WA,  May  1990. 
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[7]  A.  Agarwal.  APRIL:  a  processor  architecture  for  multiprocessing.  Lecture  given 
at  the  Symposium  on  Computer  Architecture,  Seattle,  WA  (May  1990);  Workshop  on 
Multithreaded  Processors,  MIT  Laboratory  for  Computer  Science  (February  1990). 

[8]  A.  Agarwal.  Blocking:  exploiting  spatial  locality  for  trace  compaction.  Lecture  given 
at  ACM  SIGMETRICS  1990,  Boulder,  CO,  May  1990. 

[9]  K.  Johnson.  Exploiting  concurrency  in  Voyager’s  lexical  access  subsystem.  Lecture 
given  to  the  Spoken  Language  Systems  Group,  MIT  Laboratory  for  Computer  Science, 
December  1989. 

[10]  D.  Kranz.  Mul-T:  a  high-performance  parallel  Lisp.  Lecture  given  at  SIGPLAN’89, 
Portland,  OR;  USA/Japan  Workshop  on  Parallel  Lisp,  Sendai,  Japan,  June  1989. 

[11]  J.  Kubiatowicz.  Hardware-software  tradeoffs  in  the  MIT  Alewife  machine.  Lecture 
given  at  the  Workshop  on  Shared  Memory  Multiprocessors,  Seattle,  WA,  May  1990. 

[12]  B.  H.  Lim.  A  run-time  system  for  the  MIT  Alewife  machine.  Lecture  given  at  DEC 
Systems  Research  Center,  June  1990. 
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6.1  1989-90  Activity 

Most  of  our  activity  last  year  was  centered  on  the  CAM-8  project.  This  is  a  high  perfor¬ 
mance  cellular  automata  multiprocessor  that  will  allow  one  to  explore  a  new  band  of  the 
computational  spectrum. 

The  task  is  immense,  as  it  includes:  (a)  functional  design  at  the  system  level,  at  the  board 
level,  and  at  the  chip  level;  (b)  interfacing  with  a  host  (the  SPARCstation-1)  that  was 
just  introduced,  and  whose  technical  specifications  (S-bus,  PROM  monitor,  etc.)  are  just 
beginning  to  be  available;  (c)  host  software  at  different  levels  (registers,  device  drivers, 
CAM-8  control,  user  interface,  display  and  analysis,  and  experiment  management);  issues 
related  to  massive  scalability  (three  dimensional  interconnection,  power  distribution,  and 
heat  removal;  clock  distribution;  synchronization;  error  detection  and  recovery;  and  (d) 
circuit  and  behavioral  simulation  at  the  chip  level  (we  are  designing  custom  gate  arrays)  and 
at  the  multi-chip,  multi-board  level;  etc. 

We  also  became  project  managers,  personnel  managers,  computer  system  managers,  etc. 
After  appropriate  searches,  we  hired  a  full-time  research  programmer  and  two  circuit- design 
consultants.  We  purchased  eight  SPARCstations  and  configured  them  for  a  number  of  differ¬ 
ent  tasks.  Following  successful  negotiations,  we  obtained  from  VTI  the  free  use  of  their  VLSI 
design  and  simulation  software,  and  behavioral  simulation  software  from  Veralog.  Installing 
and  learning  to  use  all  of  these  tools  is  an  enormous  job. 

We  also  started  tentative  probes  in  view  of  some  form  of  industrial  partnership  in  the  CAM-8 
project. 

Of  course,  we  had  to  continue  doing  a  fair  amount  of  responding  to  real  time  interrupts, 
such  as  conferences,  meetings,  papers  due,  reports,  proposals,  refereeing,  etc. 

In  particular,  we  wrote  a  new  three-year  proposal,  Information  Mechanics ,  for  NSF,  and  we 
participated  in  a  proposal  for  an  NSF  Science  and  Technology  Center  on  Quantum-effect 
Physics,  Electronics,  and  Computation  Structures,  with  Professors  Hank  Smith  and  Dimitri 
Antoniadis. 

Besides  work  by  the  research  staff  [285] [286] [284][219][282] [283] [28 1] [220] ,  a  fair  amount  of 
theoretical  research  was  conducted  by  our  Ph.D.  students  [53][148] [272] [271]  and  our  visitors 
[69]  [78] [73] [72] .  For  his  Master’s  thesis,  David  Harnanan  did  a  remarkable  job  at  designing 
and  simulating  the  interface  between  CAM-8  and  its  host. 
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Publications 


[1]  M.  Biafore.  Two-dimensional  Universal  Automaton  with  Electron-like  Tokens.  Techni¬ 
cal  Report  MIT/LCS/TM-429,  MIT  Laboratory  for  Computer  Science,  June  1990. 

[2]  L.  Chalfin  and  B.  S.  Tsirelson.  Quantum/classical  correspondence  in  the  light  of  Bell’s 
inequalities.  Technical  Report  MIT/LCS/TM/420,  MIT  Laboratory  for  Computer 
Science,  1990. 

[3]  B.  Chopard.  Cellular  automata  model  for  the  diffusion  equation.  J.  Stat.  Phys.,  1990. 
Submitted  for  publication. 

[4]  B.  Chopard.  A  cellular  automata  model  of  large  scale  moving  objects.  J.  Phys.  A , 
1990.  To  appear. 

[5]  F.  Commoner.  Synchronization  graphs  for  quantum  computation.  To  appear. 

[6]  H.  Hrgovcic  The  local  interpretation  of  quantum  mechanics.  PhD  thesis,  MIT,  1990. 

In  progress. 

[7]  N.  Margolus.  Parallel  quantum  computation.  In  Complexity,  Entropy,  and  the  Physics 
of  Information,  Addison-Wesley,  1990.  To  appear. 

[8]  N.  Margolus  and  T.  Toffoli.  Cellular  automata  machines.  In  Lattice  Gas  Methods  for 
Partial  Differential  Equations ,  pages  219-248,  Addison-Wesley,  1990. 

[9]  M.  Smith.  Nuclear  Fusion  Through  Dimensional  Confinement.  Technical  Memo  MIT/LCS- 
/TM-409,  MIT  Laboratory  for  Computer  Science,  August  1989. 

[10]  M.  Smith.  Representation  of  geometrical  and  topological  quantities  in  cellular  au¬ 
tomata.  Physica  D,  1990.  To  appear. 

[11]  T.  Toffoli.  Analytical  Mechanics  from  Statistics:  T  =  dS/dE  Holds  for  Almost  Any 
System.  Technical  Memo  MIT/LCS/TM-407,  MIT  Laboratory  for  Computer  Science, 
August  1989. 

[12]  T.  Toffoli.  Frontiers  in  computing.  In  Information  Processing,  Volume  1,  North- 
Ilolland,  1989. 

[13]  T.  Toffoli.  How  cheap  can  mechanics’  first  principles  be?  In  Complexity,  Entropy,  and 
the  Physics  of  Information,  Addison-Wesley,  1990.  To  appear. 

[14]  T.  Toffoli.  Cellular  automata.  In  Encyclopedia  of  Physics,  VCH,  1990. 

[15]  T.  Toffoli  and  N.  Margolus.  Invertible  cellular  automata:  a  review.  Physica  D,  1990. 

To  appear. 

[16]  T.  Toffoli  and  N.  Margolus.  Programmable  matter.  Physica  D .,  1990.  To  appear. 
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7.1  Introduction 


The  Large  Scale  Parallel  Software  Group  began  in  the  fall  of  1989.  Its  research  is  focused  on 
software  issues  involved  in  making  effective  use  of  large  scale  multiprocessors.  Most  group 
members  are  working  on  two  projects  in  this  area:  designing  a  parallel  programming  lan¬ 
guage  to  support  writing  portable  programs  for  MIMD  parallel  architectures;  and  developing 
algorithms,  and  system  and  language  support  for  writing  fault-tolerant  parallel  programs. 
As  part  of  the  first  project  to  design  a  parallel  programming  language,  we  developed  tech¬ 
niques  for  implementing  concurrent  data  structures  that  scale  well  and  make  effective  use 
of  local  caches.  In  addition,  some  group  members  are  doing  research  related  to  transaction 
processing  and  to  distributed  systems.  Our  research  in  these  areas  is  described  below. 

7.2  Portable  Parallel  Software 

We  are  designing  a  new  programming  language  to  support  the  implementation  and  execution 
of  parallel  programs.  The  language  is  intended  to  run  on  large  scale  MIMD  multiprocessors. 
It  is  clear  that  there  will  be  many  such  machines,  of  both  the  shared-memory  and  message¬ 
passing  varieties,  available  in  the  not-too-distant  future.  To  make  use  of  the  machines,  and 
to  evaluate  their  potential,  we  need  new  programming  languages  that  allow  them  to  be  used 
effectively. 

Our  language  is  intended  to  be  used  for  a  wide  range  of  applications,  including  both  symbolic 
and  numeric  computations  and  programs  that  have  side  effects.  We  expect  it  to  run  on  both 
shared-memory  multiprocessors  and  message- passing  multiprocessors.  As  a  secondary  goal, 
we  would  like  it  to  run  on  networked  collections  of  uni-  and  multi-  processors.  In  addition, 
we  would  like  programs  to  be  portable,  with  good  performance,  across  a  broad  range  of 
MIMD  architectures. 

There  are  a  number  of  issues  that  must  be  addressed  to  build  efficient  parallel  programs, 
including  scheduling,  choice  of  an  appropriate  granularity  for  processes  and  data,  placement 
and  migration  of  processes  and  data,  effective  use  of  caches,  and  synchronization.  Some 
systems  hide  most  or  all  of  these  issues  from  the  programmer.  Others  expose  the  details  of 
the  specific  architecture.  The  first  approach  gives  portability,  since  the  programmer  writes 
little  architecture-dependent  code.  However,  it  is  difficult  to  get  good  performance  using  the 
first  approach.  The  second  approach  can  give  good  performance,  but  lacks  portability  and 
can  also  be  difficult  to  use. 

To  achieve  portability  and  performance,  we  are  designing  mechanisms  that  allow  the  pro¬ 
grammer  to  control  issues  such  as  scheduling,  granularity,  and  placement  in  an  architecture- 
independent  manner.  For  example,  we  developed  a  scheduling  method  that  allows  programs 
to  be  written  so  that  the  grain  size  of  processes  adapts  to  the  number  of  processors  available 
and  on  processor  loads.  The  programmer  supplies  information  that  allows  the  system  to 
make  informed  decisions  about  which  tasks  are  most  usefully  split  into  concurrent  subtasks. 
We  are  experimenting  with  mechanisms  to  achieve  similar  results  for  the  other  issues. 

To  judge  performance,  we  plan  to  implement  our  language  on  one  or  more  real  multiproces¬ 
sors.  In  addition,  however,  we  will  use  simulators  to  allow  us  to  experiment  with  a  wider 
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range  of  architectural  features  and  to  better  understand  how  effective  our  approach  is  at 
providing  portability.  To  avoid  building  separate  simulators  for  each  target  architecture,  we 
have  designed  and  are  building  a  retargetable  simulator,  which  is  easily  tailored  to  simulate 
different  architectures. 


7.3  Fault-tolerance 

Advances  in  hardware  technology  have  made  it  plausible  to  construct  a  parallel  computer 
with  hundreds  of  thousands  or  even  millions  of  processors  sometime  in  the  next  decade. 
Programming  such  machines  will  in  itself  be  difficult.  We  believe,  however,  that  an  equally 
difficult  challenge  will  be  that  of  maintaining  a  working  machine  in  the  presence  of  failures. 

Various  studies  have  shown  the  mean  time  between  failures  (MTBF)  for  current  machines 
to  be  on  the  order  of  a  few  weeks  or  months.  A  machine  100  times  as  big  would  have  an 
MTBF  on  the  order  of  a  few  hours  or  days.  Without  mechanisms  to  limit  the  amount  of 
work  lost  when  a  failure  occurs,  the  range  of  problems  that  can  be  solved  effectively  on  such 
large  machines  will  be  limited  to  those  that  take  relatively  little  time  to  compute.  However, 
existing  techniques  for  achieving  reliable  operation  in  the  face  of  failures  do  not  scale  to  large 
numbers  of  processors,  or  to  machines  with  very  large  numbers  of  hardware  components. 

The  goal  of  this  research  is  to  develop  software  techniques  that  provide  a  low  overhead 
and  scalable  approach  to  fault-tolerance.  (Our  efforts  are  directed  at  tolerating  hardware 
and  communication  failures;  we  are  not  addressing  the  problem  of  tolerating  design  and 
programming  errors.)  Adequately  addressing  the  challenge  of  ensuring  reliable  operation 
for  large  scale  multiprocessors  requires  improvements  at  all  levels,  from  low  level  hardware 
components  through  application  software.  We  are  focusing  on  software  techniques  because 
they  provide  more  flexibility  in  tailoring  the  level  of  redundancy  and  reliability  to  the  needs 
of  applications,  and  also  make  it  easier  to  reconfigure  machines  dynamically  when  failures 
occur.  In  addition,  by  focusing  initially  on  the  requirements  of  applications  and  of  system 
software,  we  will  gain  a  better  understanding  of  what  should  be  provided  by  lower  levels. 

Scalability  is  important  to  make  effective  use  of  large  scale  machines.  In  addition,  low 
overhead  is  essential  to  allow  fine-grained  parallel  computations.  In  particular,  if  the  fault- 
tolerance  mechanisms  significantly  increased  the  cost  of  communication,  then  programmers 
would  be  forced  to  use  a  larger  grain  size  to  amortize  the  overhead. 

Fault-tolerance  methods  typically  exhibit  a  tradeoff  between  overhead  during  normal  opera¬ 
tion  and  the  cost  of  recovering  from  a  failure.  Fault-tolerance  mechanisms  for  parallel  systems 
can  be  loosely  classified  as  either  optimistic  or  pessimistic.  Purely  optimistic  methods  record 
checkpointed  process  states  and  other  information  asynchronously,  without  forcing  delays  at 
particular  points  in  the  computation.  After  a  failure,  such  methods  must  rollback  one  or 
more  processes  to  earlier  states  to  find  a  consistent  state  of  the  system.  In  contrast,  pes¬ 
simistic  methods  prevent  the  failure  of  one  process  from  causing  other  processes  to  rollback. 
Optimistic  methods  attempt  to  minimize  overhead,  but  may  take  a  long  time  to  recover. 
Pessimistic  methods  impose  significant  delays,  but  can  recover  quickly. 

Pessimistic  methods  do  not  meet  our  goal  of  low  overhead.  Purely  optimistic  methods, 
however,  do  not  scale  well:  they  perform  very  poorly  in  large  scale  multiprocessors.  There 
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are  two  reasons  for  this.  First,  recovery  from  a  single  failure  could  involve  every  processor  in 
the  machine,  and  could  take  long  enough  that  another  failure  would  be  very  likely  to  occur 
before  recovery  from  the  first  failure  completed.  As  a  result,  useful  forward  progress  could 
occur  very  slowly.  Second,  given  a  fixed  probability  of  failure  for  each  individual  component 
in  each  step,  we  can  show  that  the  expected  expansion  factor  in  execution  time  using  a  purely 
optimistic  method  grows  exponentially  with  the  number  of  components,  even  ignoring  the 
fact  that  recovery  takes  longer  for  larger  machines.  (The  expansion  factor  is  the  ratio  of 
the  actual  execution  time  for  a  computation  in  the  presence  of  failures  to  the  amount  of 
time  required  for  the  computation  when  no  failures  occur.)  Thus,  for  any  fixed  component 
reliability,  a  purely  optimistic  method  breaks  down  for  very  large  machines. 

The  best  approach  seems  to  be  an  optimistic  approach  that  limits  the  amount  of  work 
required  to  recover  from  a  failure;  such  an  approach  will  sometimes  require  a  process  to  wait 
while  recovery  information  is  recorded,  but  the  delays  should  be  shorter  and  less  frequent 
than  with  purely  pessimistic  methods.  Over  the  past  year,  we  have  designed  a  “limited 
optimistic”  method  that  represents  an  intermediate  point  in  the  tradeoff  between  overhead 
during  normal  operation  and  recovery  time:  the  overhead  is  more  than  purely  optimistic 
methods,  but  less  than  pessimistic  ones,  while  the  recovery  time  is  less  than  optimistic 
methods,  but  more  than  pessimistic  ones.  The  technique  limits  the  length  of  a  chain  of 
processes  that  can  be  involved  in  a  cascaded  rollback  after  a  failure;  this  limits  the  recovery 
time,  and  prevents  the  blowup  in  the  expansion  factor  that  occurs  with  purely  optimistic 
methods. 

In  general,  it  is  difficult  to  achieve  both  low  overhead  and  fast  recovery.  As  a  way  of 
circumventing  this  tradeoff  and  achieving  both  very  low  overhead  and  fast  recovery,  Anthony 
Joseph  has  been  developing  techniques  for  application-specific  fault-tolerance.  The  idea  is  to 
take  advantage  of  properties  of  applications  to  reduce  the  communication  and  coordination 
required  for  fault-tolerance,  to  reduce  the  work  lost  when  a  failure  occurs,  and  to  reduce 
the  time  taken  to  recover  from  a  failure.  Our  initial  experiments  indicate  that  application- 
specific  methods  can  perform  significantly  better  for  some  applications  than  any  application- 
independent  method. 

Application-specific  methods  have  the  disadvantage  that  they  require  additional  program¬ 
ming  that  could  add  significantly  to  the  complexities  of  writing  a  parallel  program.  In 
our  experience  so  far,  however,  the  added  complexity  is  actually  quite  small.  For  exam¬ 
ple,  Anthony  Joseph  has  been  looking  at  numerical  algorithms.  For  some  algorithms,  such 
as  asynchronous  iterative  methods  [36],  processes  can  take  checkpoints  and  recovery  from 
failures  without  any  coordination  or  communication.  This  purely  local  method  eliminates 
almost  all  of  the  overhead  of  more  general  methods,  and  also  loses  much  less  work  when 
a  failure  occurs.  Simulations  show  that  it  performs  significantly  better  than  other  meth¬ 
ods  [154].  Other  examples  have  been  studied  by  Henri  Bal,  who  wrote  several  fault-tolerant 
parallel  programs  in  Argus  [197]  while  visiting  the  Large  Scale  Parallel  Software  Group  in 
the  fall  of  1989.  His  experience  is  described  in  [32].  We  are  currently  studying  other  appli¬ 
cations,  both  to  understand  what  language  and  system  support  might  simplify  the  task  of 
writing  fault-tolerant  parallel  programs,  and  to  understand  the  potential  performance  gain 
of  application-specific  methods  over  application-independent  methods. 
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7.4  Concurrent  Data  Structures 

Many  parallel  programs  are  organized  as  a  collection  of  processes  accessing  shared  data 
structures,  through  which  processes  communicate  and  coordinate  with  each  other.  This 
organization  is  common  in  programs  written  for  shared-memory  architectures;  it  also  applies 
to  many  programs  written  for  message-passing  architectures,  particularly  those  written  in  an 
object-oriented  style.  Over  the  last  year,  we  developed  several  new  algorithms  for  concurrent 
data  structures.  Our  new  algorithms  reduce  contention  among  the  processes  sharing  the  data 
structure;  as  a  result,  our  algorithms  provide  significantly  better  performance  than  existing 
algorithms,  particularly  when  many  processes  are  using  the  data  structure  concurrently.  As 
part  of  our  work  on  concurrent  data  structures,  we  also  developed  new  techniques  for  software 
cache  management,  as  well  as  new  implementation  techniques  for  synchronization  primitives 
such  as  read-write  locks.  The  sections  below  describe  this  work  in  more  detail. 

7.4.1  Concurrent  B-trees 

B-trees  are  widely  used  in  databases  and  other  applications  (e.g.,  file  systems)  that  require 
fast  access  to  data  stored  on  disk;  they  are  also  useful  in  parallel  programs  that  require  fast 
access  to  shared  data,  even  if  the  data  is  stored  in  main  memory.  A  number  of  algorithms 
for  concurrent  access  to  B-trees  have  been  developed  (e.g.,  see  [37][187][228][180] [258]);  all 
of  the  previously  existing  algorithms  require  processes  to  lock  nodes  of  the  tree  in  such  a 
way  that  a  process  updating  a  node  blocks  all  other  processes  that  attempt  to  access  the 
node  while  the  update  is  being  performed.  We  developed  a  new  concurrent  B-tree  algorithm 
that  eliminates  the  need  for  a  process  traversing  down  the  tree  to  block  while  an  update  is 
propagated  up  the  tree  [290].  The  net  result  is  that  contention  at  non-leaf  nodes  of  the  tree 
is  virtually  eliminated. 

The  basis  of  our  new  algorithm  is  an  abstraction  that  is  similar  to  coherent  shared  memory, 
but  provides  a  weaker  semantics;  we  call  this  abstraction  multi-version  memory.  Multi¬ 
version  memory  is  used  in  the  algorithm  for  all  non-leaf  nodes  of  the  B-tree,  while  coherent 
shared  memory  is  used  for  the  leaves.  Multi- version  memory  weakens  the  semantics  of 
ordinary  shared  memory  by  allowing  a  process  reading  data  to  be  given  an  old  version  of 
the  data.  (For  example,  it  might  simply  use  the  version  in  its  cache.)  While  this  weaker 
semantics  is  not  as  generally  useful  as  that  provided  by  coherent  shared  memory,  it  turns 
out  to  be  adequate  for  our  B-tree  algorithms.  We  describe  multi-version  memory  in  more 
detail  in  Section  7.4.3  below. 

As  described  below,  multi-version  memory  can  be  implemented  so  that  a  process  reading  data 
can  use  a  local  cached  copy,  and  almost  never  needs  to  be  delayed  while  waiting  for  messages 
that  update  or  invalidate  caches.  As  a  result,  our  new  concurrent  B-tree  algorithm  should 
continue  to  work  well  in  large  scale  parallel  systems  in  which  the  number  of  processors  sharing 
the  tree  is  large  or  the  communication  delay  between  processors  (or  between  processors 
and  the  global  memory  for  a  shared-memory  system)  is  large  relative  to  the  speed  of  local 
computation. 

Relatively  little  work  has  been  done  to  study  the  performance  of  concurrent  B-tree  algo¬ 
rithms.  In  a  Master’s  thesis  due  to  be  completed  in  September  1990,  Paul  Wang  has  been 
evaluating  the  performance  of  our  new  algorithm  and  comparing  it  to  that  of  other  B-tree 


93 


Large  Scale  Parallel  Systems 


algorithms.  Four  different  B-tree  algorithms  have  been  implemented  in  Concurrent  Aggre¬ 
gates  (CA),  a  language  developed  by  Andrew  Chien  [71].  CA  allows  a  collection  of  objects 
to  be  viewed  as  a  single  logical  object,  thus  making  it  easy  to  encapsulate  cache  management 
algorithms  for  multi-version  memory  and  for  coherent  shared  memory.  The  resulting  pro¬ 
grams  have  been  run  on  a  simulator  for  a  message-passing  multiprocessor.  The  simulations 
are  being  used  to  study  how  the  performance  of  each  algorithm  varies  with  the  number  of 
worker  processes  accessing  the  B-tree,  with  the  length  of  the  delay  for  interprocessor  mes¬ 
sages,  and  with  the  implementations  of  multi- version  memory  and  coherent  shared  memory. 
Results  to  date  indicate  that  our  new  algorithm  provides  significantly  lower  latency  and 
higher  throughput  than  the  other  algorithms. 

7.4.2  Concurrent  Priority  Queues 

The  priority  queue  is  a  fundamental  data  structure  that  has  been  used  in  a  large  variety  of 
parallel  algorithms,  such  as  multiprocessor  scheduling  and  parallel  best-first  search  of  state- 
space  graphs.  In  these  algorithms,  each  process  performs  an  access-think  cycle,  in  which  the 
access  is  one  of  the  insert ,  extract ,  decrease  key ,  and  delete  operations  on  the  priority  queue. 
For  his  Master’s  thesis,  Qin  Huang  has  been  designing  and  evaluating  algorithms  for  parallel 
priority  queues. 

Algorithms  that  ensure  a  strict  semantics  for  the  extract  operation,  which  extracts  the  mini¬ 
mum  element  from  a  priority  queue,  exhibit  limited  speedup  because  of  the  bottleneck  caused 
by  the  synchronization  needed  to  ensure  that  the  element  returned  by  the  extract  operation 
is  the  least  element.  To  avoid  this  bottleneck,  it  is  necessary  to  relax  the  specification  of  the 
extract  operation  so  that  it  is  not  required  to  return  the  least  element.  The  performance  of 
the  application  may  be  better  if  the  element  returned  is  close  to  the  minimum,  but  in  many 
applications  the  correctness  of  the  final  result  of  the  computation  does  not  depend  on  which 
element  is  returned. 

VVe  designed  two  different  algorithms  that  relax  the  specification  of  extract.  One  is  based 
on  Fibonacci  heaps,  the  asymptotically  most  efficient  data  structure  for  sequential  priority 
queues.  This  algorithm  keeps  a  cache  of  a  small  number  of  the  most  promising  elements 
of  the  queue;  a  process  executing  extract  selects  a  random  element  from  the  cache,  thus 
avoiding  a  single  serial  bottleneck.  Bottlenecks  in  accessing  the  root  list  of  the  Fibonacci 
heap  are  avoided  by  dividing  it  into  a  number  of  separate  sections,  each  of  which  can  be 
accessed  independently. 

The  second  approach,  which  we  have  called  a  concurrent  priority  pool,  is  based  on  a  combi¬ 
nation  of  a  concurrent  B-tree  algorithm  and  concurrent  pools  [212] [1 76].  A  concurrent  pool 
is  a  distributed  data  structure  for  managing  a  pool  of  resources;  it  is  divided  into  a  number  of 
segments  that  can  be  accessed  independently  to  add  and  remove  elements  from  the  pool.  A 
concurrent  priority  pool  is  like  a  concurrent  B-tree,  except  that  concurrent  pools  (extended 
to  handle  splitting  and  merging)  are  used  for  the  leaves.  The  extract  operation  attempts  to 
remove  an  element  from  the  leftmost  leaf  by  accessing  one  of  the  segments  in  that  leaf.  Since 
there  are  multiple  segments,  several  extract  operations  can  proceed  concurrently  without  any 
inte  rerence. 

Both  algorithms  allow  the  quality  of  the  element  returned  by  extract  to  be  controlled,  in 
the  first  case  by  controlling  the  size  of  the  cache  of  promising  elements  and  in  the  second 
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case  by  controlling  the  number  of  segments  in  each  leaf  of  the  tree.  Higher  speedups  can  be 
obtained  by  relaxing  the  quality  of  the  returned  elements.  Thus,  they  permit  an  application 
to  tune  its  use  of  a  concurrent  priority  queue  to  balance  the  quality  of  the  elements  returned 
by  extract  against  the  contention  found  in  accessing  the  queue. 

Both  algorithms  have  been  implemented  in  Mul-T  [177],  along  with  a  concurrent  binary 
heap  algorithm  developed  at  the  University  of  Texas  at  Austin  [251].  Experiments  have 
been  performed  on  an  Encore  Multimax.  Results  to  date  show  that  the  two  new  algorithms 
provide  essentially  linear  speedup  for  up  to  10  processors  (as  many  as  the  machine  currently 
provides),  while  the  concurrent  binary  heap  is  limited  by  contention  at  the  root  of  the  heap 
to  a  speedup  of  about  4.  Further  experiments  are  planned,  both  on  an  Encore  machine 
with  more  processors  and  on  simulators,  to  see  how  well  the  algorithms  perform  with  more 
processors  and  to  understand  how  the  quality  of  the  elements  returned  by  extract  is  affected 
by  the  number  of  processors  concurrently  accessing  the  queue.  Applications  of  parallel 
priority  queues,  such  as  a  parallel  single-source  shortest  path  algorithm  and  a  parallel  solution 
to  the  traveling  salesman  problem,  are  also  being  tested  to  evaluate  the  performance  of  the 
different  parallel  priority  queue  algorithms  for  particular  applications. 

7.4.3  Software  Cache  Management 

In  many  parallel  applications,  caching  is  vital  for  achieving  high  performance.  For  example, 
the  root  of  a  B-tree  is  visited  by  every  operation  on  the  tree,  and  is  rarely  updated.  If 
only  a  single  copy  of  the  root  is  maintained  (either  in  global  memory  in  a  shared-memory 
architecture,  or  in  the  memory  of  a  single  processor  in  a  message-passing  architecture),  the 
root  is  likely  to  be  a  serious  limiting  factor  in  performance.  Caching  improves  performance 
in  part  by  allowing  data  to  be  accessed  in  local  memory,  thus  avoiding  the  delay  involved  in 
accessing  a  remote  memory,  and  in  part  by  replicating  data  so  that  many  processes  can  read 
it  in  parallel.  Coherent  shared  memory,  however,  constrains  caches  to  be  managed  so  that  the 
read  and  write  operations  appear  to  be  atomic.1  These  constraints  require  synchronization 
between  readers  and  writers,  and  also  require  communication  to  update  or  invalidate  caches 
after  a  processor  has  written  to  memory. 

An  alternative  to  maintaining  cache  coherence  is  to  delegate  the  management  of  cached 
copies  to  the  application.  The  advantage  of  this  approach  is  that  the  cache  management 
algorithm  can  be  tailored  to  the  needs  of  the  application.  The  disadvantage  is  that  programs 
could  become  significantly  more  complex.  However,  we  believe  that  this  complexity  can  be 
managed  by  encapsulating  cache  management  algorithms  in  the  implementations  of  abstract 
data  types. 

Multi-version  memory  is  an  example  of  a  memory-like  abstraction  that  can  take  advantage 
of  local  caches,  but  requires  less  synchronization  and  communication  than  coherent  shared 
memory.  Abstractly,  the  state  of  a  multi-version  object  at  any  point  in  time  is  a  sequence 
of  versions.  The  first  version  in  the  sequence  is  the  initial  version,  and  the  last  version  is 
the  current  version.  Writers  update  the  object  by  extending  the  sequence  with  additional 

lThere  are  a  number  of  subtly  different  correctness  criteria  that  have  been  used  for  coherent  shared 
memory,  including  sequential  consistency  [183]  and  linearizabihty  [144].  We  will  take  linearizability  as  our 
definition  of  correctness. 
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versions  (thus  changing  the  current  version),  and  readers  read  the  object  by  choosing  and 
reading  some  version.  The  specification  allows  a  reader  to  read  any  version,  not  just  the 
current  version.  This  nondeterminism  allows  us  to  implement  a  multi-version  memory  so 
that  readers  can  run  in  parallel  with  writers.  In  addition,  propagation  of  an  update  to 
cached  copies  can  be  done  lazily  in  the  background;  thus,  the  invalidation  (or  update)  load 
for  a  heavily  shared  object  can  be  spread  out  over  time,  and  processes  need  not  wait  for 
propagation  to  complete. 

An  application  that  can  use  a  multi-version  memory  will  probably  perform  better  if  readers 
obtain  fairly  recent  versions,  but  the  specification  of  a  multi- version  memory  requires  the 
application  to  be  prepared  for  a  reader  to  obtain  an  arbitrary  version.  Additional  constraints 
could  be  added  to  the  specification.  For  example,  we  might  require  a  process  reading  a  multi¬ 
version  object  to  choose  a  version  that  is  no  older  than  any  other  version  already  used  by  the 
process.  Alternatively,  we  could  require  it  to  choose  one  of  the  k  most  recent  versions.  Such 
constraints  are  not  needed  for  the  applications  we  have  studied  so  far.  However,  they  might 
be  useful  for  some  applications,  and  should  have  little  negative  impact  on  performance. 

Brad  Spiers  has  been  studying  the  performance  of  different  implementations  of  multi-version 
memory  and  comparing  their  performance  to  that  of  coherent  shared  memory.  He  has  been 
using  a  simple  simulator,  built  by  Wilson  Hsieh,  for  a  shared-memory  multiprocessor.  The 
simulations  show,  as  expected,  that  multi-version  memory  and  coherent  shared  memory 
take  approximately  the  same  amount  of  time  for  workloads  in  which  either  all  processes  only 
read  or  in  which  all  processes  only  write.  Simulations  are  currently  being  done  for  mixed 
workloads,  in  which  processes  both  read  and  write.  Our  expectation  is  that  multi- version 
memory  should  provide  better  performance  than  coherent  shared  memory  in  these  cases. 
The  simulations  will  help  us  understand  the  magnitude  of  the  performance  difference,  and 
how  it  is  affected  by  variations  in  the  number  of  processors  and  in  the  time  required  for 
accesses  to  global  memory.  They  will  also  help  us  understand  how  often  a  reader  obtains  a 
version  other  than  the  current  version;  this  information  will  help  in  predicting  the  utility  of 
multi-version  memory  for  various  applications. 

As  discussed  in  Section  7.4.1,  multi- version  memory  can  be  used  in  concurrent  B-tree  algo¬ 
rithms  to  reduce  contention  and  mask  network  latency.  In  general,  it  can  be  used  in  any 
application  in  which  a  process  reading  data  can  tolerate  reading  an  old  version  of  the  data. 
For  example,  in  asynchronous  iterative  relaxation  algorithms,  a  process  computing  a  new 
value  for  one  point  can  use  values  for  other  points  from  any  previous  iteration.  Similarly,  in 
a  branch-and-bound  search  algorithm  implemented  using  several  worker  processes,  it  is  not 
necessary  for  each  process  to  know  the  most  recent  value  of  the  global  bound  representing 
the  best  solution  found  so  far. 

The  advantages  of  multi-version  memory  over  coherent  shared  memory  suggest  that  it  may 
be  fruitful  to  view  cache  management  as  an  application-level  replication  problem,  where  both 
the  semantics  of  the  shared  data  and  the  algorithm  used  to  manage  caches  can  be  designed  as 
part,  of  the  application.  Such  an  approach  fits  naturally  into  an  object-oriented  programming 
style  based  on  inventing  application-specific  abstract  data  types,  such  as  that  advocated  by 
Liskov  and  Guttag  [203 j .  Future  research  will  consider  what  primitives  should  be  provided 
by  the  hardware  and  by  the  programming  language  to  support  this  kind  of  software  cache 
management. 


96 


Large  Scale  Parallel  Systems 


7.4.4  Distributed  Locking 

Read-write  locks  are  used  in  many  concurrent  data  structures.  The  traditional  method  for 
implementing  read-write  locks,  and  many  other  synchronization  primitives,  is  by  use  of  a 
monitor  [147].  A  monitor  protects  synchronization  data  with  a  mutual  exclusion  lock;  any 
process  that  wishes  to  access  the  synchronization  data  (e.g.,  to  acquire  a  read  lock,  or  to 
release  a  write  lock)  must  first  acquire  the  mutual  exclusion  lock.  To  implement  a  read- 
write  lock,  the  monitor  might  contain  a  count  of  the  number  of  readers  and  a  flag  ind’cating 
whether  a  writer  is  currently  active. 

Because  the  data  in  a  monitor  can  be  accessed  by  only  one  process  at  a  time,  the  monitor 
itself  can  be  a  serious  source  of  contention.  In  many  cases,  this  contention  is  logically 
unnecessary.  For  example,  processes  acquiring  read  locks  do  not  need  to  synchronize  with 
each  other;  however,  processes  acquiring  write  locks  do  need  to  synchronize  with  each  other 
and  with  processes  acquiring  read  locks.  A  distributed  locking  strategy  can  be  used  to  avoid 
this  unnecessary  contention. 

We  designed  a  distributed  locking  strategy  based  on  the  idea  of  caching  locks  as  well  as 
data.  If  a  lock  is  already  cached  in  a  particular  mode,  an  acquisition  request  for  the  lock 
in  that  mode  can  he  satisfied  without  communicating  with  other  processes.  Otherwise,  the 
acquisition  request  is  handled  by  ensuring  that  no  process  has  the  lock  cached  in  a  conflicting 
mode.  Similar  strategies  have  been  used  in  distributed  file  systems  and  in  the  Vaxcluster 
lock  manager. 

Caching  locks  allows  readers  to  run  without  interfering  with  each  other.  However,  the  cost  of 
acquiring  a  write  lock  can  be  quite  high,  since  it  involves  synchronizing  with  all  readers  that 
have  the  lock  cached.  We  are  experimenting  with  other  distributed  locking  strategies,  based 
on  software  combining  trees,  that  reduce  the  cost  of  write  locks  with  only  minimal  increase 
in  the  cost  of  read  locks.  Wilson  Hsieh  is  using  a  combination  of  analysis  and  simulation  to 
understand  the  tradeoffs  between  the  cod  of  read  locks  and  the  cost  of  write  locks. 

For  the  simulations,  Wilson  built  a  simulator  for  a  shared-memory  multiprocessor  that  allows 
the  user  to  write  parallel  programs  in  Scheme,  using  special  functions  to  access  shared  objects. 
The  simulator  simulates  a  parallel  machine  with  an  arbitrary  number  of  processors,  each  of 
which  has  some  local  memory.  There  is  also  a  global  memory,  which  must  be  accessed  over 
the  network.  The  simulator  does  not  simulate  network  contention  (e.g.,  tree  saturation  of 
the  network),  but  does  simulate  contention  on  objects  in  the  global  memory.  Each  object  in 
global  memory  is  treated  as  if  it  has  infinite  memory  to  queue  memory  requests.  The  relative 
costs  of  network  accesses  and  local  operations  can  be  varied  by  the  user.  The  simulator  is 
also  being  used  by  other  students  for  other  experiments. 

7.5  Performance  Specifications 

A  program  has  a  performance  bug  when  some  cost  of  its  execution — e.g.,  response  time, 
throughput,  or  resource  utilization — is  higher  than  it  is  supposed  to  be.  Performance  de¬ 
bugging  is  the  process  of  detecting,  locating,  and  eliminating  performance  bugs.  Building 
programs  that  perform  well  typically  involves  a  combination  of  designing  for  good  perfor¬ 
mance  from  the  start  and  doing  performance  debugging  when  problems  show  up  after  the 
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program  is  implemented  and  running.  In  many  situations,  the  implementor  of  a  program 
does  not — and  should  not  be  expected  to — understand  the  implementation  of  the  underlying 
system  at  all  levels.  Most  application  programmers  trying  to  make  their  programs  run  fast 
will  not  have  detailed  knowledge  about  the  implementation  of  the  underlying  operating  sys¬ 
tem,  yet  application  performance  is  affected  by  operating  system  performance,  which  itself 
is  affected  by  the  manner  in  which  the  application  uses  the  operating  system. 

It  has  long  been  common  software  engineering  practice  to  provide  functional  specifications 
for  program  modules  so  that  a  client  of  a  module  need  not  know  how  it  is  implemented  in 
order  to  use  it.  A  functional  specification  tells  the  client  exactly  what  functionality  he  can 
expect  from  the  module,  and  a  functional  bug  exists  when  a  module  fails  to  meet  its  specifi¬ 
cation.  In  the  same  way,  a  performance  bug  is  a  failure  to  meet  a  performance  specification. 
Yet,  precise  performance  specifications  are  even  rarer  than  precise  functional  specifications. 
At  best,  performance  specifications  tend  to  be  vaguely  stated,  and  at  worst,  they  exist  only 
in  implementors’  and  users’  minds  as  some  expectation  of  execution  cost.  Consequently, 
mistaken  performance  expectations  are  difficult  to  detect  and  correct,  and  when  perfor¬ 
mance  bugs  do  exist,  the  person  doing  the  debugging  must  have  extensive  knowledge  of  the 
implementation  of  large  parts  of  the  system. 

For  her  Ph.D.  thesis,  Sharon  Perl  is  developing  a  method  for  writing  performance  specifica¬ 
tions  of  concurrent  systems.  The  goal  is  to  make  it  easier  for  programmers  to  tell  when  a 
program  has  a  performance  bug  and  to  determine  which  part  of  the  program  is  at  fault.  The 
thesis  of  this  research  is  that  having  explicit  and  precise  performance  specifications  makes  it 
easier  to  build  programs  that  perform  well  and,  furthermore,  that  writing  such  specifications 
is  a  practical  endeavor.  The  latter  claim  will  be  demonstrated  empirically,  by  developing  per¬ 
formance  specifications  for  a  real  software  system.  The  work  focuses  on  specifying  response 
times,  although  we  hope  the  work  will  extend  to  other  aspects  of  performance.  The  former 
claim  will  be  supported  through  argument  and  reports  of  experience  with  the  specifications 
that  are  developed. 

The  context  for  this  work  is  concurrent  systems,  ranging  from  multitasking  uniprocessor 
systems  to  small-to-medium  scale  multiprocessors  to  distributed  systems.  We  are  not  con¬ 
sidering  highly  parallel  systems  or  applications  (e.g.,  Connection  Machines),  though  the 
results  may  be  of  some  use  in  that  domain.  We  assume  that  the  software  to  be  specified 
has  a  modular  structure,  i.e.,  that  it  is  decomposable  into  n  odules  or  subsystems  for  which 
performance  specifications  may  be  written.  An  underlying  assumption  of  this  work  is  that 
performance  bugs  exist  that  are  not  often  detected,  or  that  are  caught  fairly  long  after  they 
appear.  This  is  a  reasonable  assumption  in  our  experience,  particularly  for  systems  where 
good  performance  is  desirable  but  is  not  the  primary  concern  of  the  implementors  (e.g.. 
systems  in  research  environments). 

This  work  should  have  the  following  major  contributions: 

•  An  approach  to  writing  performance  specifications  that  identifies  a  structure  and  con¬ 
tent  of  specifications  appropriate  for  their  use  as  documentation  and  in  performance 
debugging; 

•  A  methodology  for  using  performance  specifications  for  performance  debugging; 
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•  Actual  performance  specifications  for  a  significant  part  of  a  distributed  file  system; 

•  A  methodology  for  developing  performance  specifications;  and 

•  An  initial  version  of  a  language  or  notation  for  writing  performance  specifications. 

The  method  will  be  demonstrated  by  specifying  the  performance  of  significant  parts  of  a 
replicated  distributed  file  system. 

7.6  Atomic  Garbage  Collection 

Transactions,  used  in  database  and  distributed  systems,  provide  fault-tolerance  by  masking 
failures  that  occur  while  they  are  running.  Automatic  storage  management,  used  in  modern 
programming  languages,  enhances  reliability  by  preventing  errors  due  to  explicit  deallocation 
(e.g.,  dangling  references  and  storage  leaks).  A  uniform  storage  model  simplifies  program¬ 
ming  by  eliminating  the  distinction  between  accessing  temporary  storage  and  permanent 
storage.  We  call  storage  that  is  managed  automatically  using  garbage  collection,  manipu¬ 
lated  using  atomic  transactions,  and  accessed  uniformly,  a  stable  heap.  For  his  Ph.D.  thesis, 
Elliot  Kolodner  is  designing  and  prototyping  algorithms  for  managing  a  large  stable  heap. 
Stable  heap  management  will  make  it  easier  to  write  reliable  programs  and  could  be  useful  in 
programming  languages  for  reliable  distributed  computing  [196]  [90],  programming  languages 
with  persistent  storage  [10][18],  and  object  oriented  database  systems  [66] [211] [293] [299]. 

A  programmer  views  a  stable  heap  as  a  single-level  store.  The  heap  is  stored  on  disk,  with 
pages  brought  into  primary  memory  as  needed.  The  heap  has  a  designated  set  of  root  objects. 
Not  all  objects  are  treated  as  stable;  instead  the  set  of  roots  is  partitioned  into  stable  and 
volatile  subsets,  and  an  object  is  stable  if  and  only  if  it  is  accessible  from  one  of  the  stable 
roots 

Computations  run  as  atomic  transactions  [132].  Objects  are  created  and  modified  by  trans¬ 
actions;  an  object  becomes  stable  if  a  pointer  to  it  is  placed  in  an  existing  stable  object  by 
a  transaction  that  commits.  A  garbage  collector  reclaims  an  object’s  storage  automatically 
when  the  object  is  no  longer  accessible  from  any  of  the  roots.  In  addition,  a  recovery  system 
ensures  that  stable  objects  survive  failures:  modifications  performed  by  aborted  transac¬ 
tions  are  undone,  while  modifications  performed  by  committed  transactions  are  guaranteed 
to  survive  both  system  crashes  and  media  failures. 

Automatic  storage  management  for  a  stable  heap  is  complicated  by  the  fact  that  a  garbage 
collector  typically  moves  and  modifies  objects.  Collectors  move  objects  to  improve  paging 
performance;  they  modify  objects  to  reduce  the  amount  of  additional  storage  needed  by  the 
collector  itself.  The  movement  and  modification  of  objects  during  garbage  collection  requires 
coordination  with  the  recovery  system.  A  collection  algorithm  for  a  stable  heap  that  solves 
these  problems  is  called  an  atomic  garbage  collector. 

Many  applications,  such  as  computer-aided  design,  computer-aided  software  engineering,  and 
office  information  systems,  require  large  amounts  of  storage,  timely  responses  for  transac¬ 
tions,  and  high  availability.  Our  earlier  research  produced  an  atomic  garbage  collector,  and 
recovery  system  suitable  for  small  stable  heaps  [175][174].  In  that  work,  the  atomic  collector 
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is  based  on  a  stop-the-world  copying  collector  and  the  recovery  system  uses  a  shadowing 
technique  that  requires  a  traversal  of  the  stable  object  graph  after  a  crash.  These  algorithms 
are  not  suitable  for  applications  with  large  heaps:  a  stop-the-world  garbage  collector  may 
delay  transactions  arbitrarily,  and  the  traversal  of  the  stable  state  slows  recovery  after  a 
failure,  reducing  availability. 

The  goal  of  our  current  research  is  to  design  an  integrated  atomic  garbage  collector  and 
recovery  system  appropriate  for  a  large  stable  heap  on  stock  hardware.  The  collector  must 
be  incremental  and  atomic:  it  cannot  attempt  to  collect  the  whole  heap  in  one  pause  and  it 
must  interact  correctly  with  the  recovery  system.  The  time  for  recovery  must  be  independent 
of  heap  size  and  adjustable  to  be  arbitrarily  short  using  checkpoints. 

During  the  past  year,  we  completed  the  design  of  the  algorithms.  Our  approach  divides  the 
heap  into  volatile  and  stable  areas.  Objects  are  created  in  the  volatile  area.  After  an  object 
becomes  stable  it  is  moved  to  the  stable  area  at  an  appropriate  time.  The  volatile  area  can 
be  collected  using  incremental  or  generational  collection.  We  designed  an  incremental  atomic 
garbage  collector  based  on  the  ideas  of  Ellis,  Li,  and  Appel  [105]  to  collect  the  stable  areas. 
The  collector  interacts  correctly  with  the  recovery  system,  and  it  writes  enough  information 
to  the  log  (maintained  by  the  recovery  system)  to  avoid  redoing  the  entire  collection  after  a 
crash. 

We  have  worked  out  the  details  for  two  approaches  to  recovery:  (1)  write-ahead  logging  with 
update-in-place,  and  (2)  a  variation  of  intentions  lists.  In  both  approaches  the  information 
associated  with  stable  objects  maintained  for  active  transactions  is  kept  in  the  volatile  area 
separate  from  the  objects  themselves.  This  allows  recovery  without  a  traversal  of  the  object 
graph  and  lowers  the  space  overhead  for  stable  objects.  For  both  approaches,  the  location  of 
a  committed  object  version  in  the  stable  area  does  not  change;  it  is  updated  in  place.  This 
avoids  creating  garbage  in  the  stable  area  and  lowers  the  space  overhead  for  recovery. 

While  designing  the  new  recovery  system,  we  found  a  bug  in  the  design  of  the  current  Argus 
recovery  system.  Because  of  the  bug,  a  transaction  might  commit  before  values  for  all  the 
objects  accessible  from  the  object  that  it  modified  have  been  written  to  stable  storage.  Thus, 
after  a  crash  the  effects  of  some  committed  transactions  might  not  be  recoverable.  In  a  design 
note  [173],  we  describe  new  recovery  algorithms  that  overcome  this  bug. 

Currently,  we  are  implementing  a  prototype  of  a  stable  heap  to  show  the  feasibility  of  our 
design.  The  current  implementation  of  Argus  [201]  serves  as  the  basis  for  the  prototype;  we 
are  replacing  its  storage  management  and  recovery  algorithms. 

7.7  Communication  in  Heterogeneous  Distributed  Systems 

To  send  a  message  in  a  heterogeneous  distributed  system,  it  may  be  necessary  for  the  sender 
or  receiver  of  the  message,  or  both,  to  translate  the  data  in  the  message  to  or  from  its 
internal  format.  The  typical  method  for  handling  this  problem  is  to  define  a  single  standard 
representation  for  each  data  type  to  be  used  in  messages.  If  a  module’s  internal  data  uses  a 
different  representation,  it  must  translate  the  data  to  or  from  the  standard  to  send  or  receive 
it.  If  two  communicating  modules  use  the  same  internal  representation  that  differs  from  the 
standard  representation,  this  scheme  results  in  unnecessary  translation  by  both  modules. 
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An  alternative  approach  is  to  define  multiple  representations  for  each  data  type  to  be  used 
for  communication.  By  choosing  a  representation  for  the  data  in  a  message  that  matches  its 
internal  representation,  the  sender  of  the  message  can  avoid  excess  translation. 

For  his  Ph.D.  thesis,  Earl  Waldin  is  designing  a  communication  system  that  allows  multiple 
representations  of  data  types  to  be  used  in  messages.  Some  existing  systems  allow  multiple 
representations  to  be  defined  for  transmitting  simple  scalar  data  such  as  integers  and  char¬ 
acters.  Multiple  representations  can  also  be  useful  for  more  complex  data  structures,  and  in 
general  for  arbitrary  abstract  data  types.  Allowing  multiple  representations  has  two  main 
advantages:  it  can  reduce  the  amount  of  translation  involved  in  sending  messages,  and  it 
allows  the  system  to  evolve  easily  by  adding  or  changing  the  representations  of  data  used  in 
messages. 

A  prototype  is  being  designed  as  part  of  the  Mercury  system.  In  Mercury,  a  distributed 
program  is  composed  of  distinct  entities  that  communicate  with  one  another,  where  an 
entity  resides  entirely  at  a  single  node.  Entities  are  implemented  by  different  mechanisms  in 
different  programming  languages.  A  receiving  entity  provides  one  or  more  ports ,  which  are 
procedures  that  can  be  called  remotely  from  other  entities.  The  arguments  and  results  of 
port  invocations  are  pissed  by  value;  there  is  no  way  for  references  to  data  in  one  entity  to 
be  sent  to  other  entities.  The  transmitted  values  belong  to  value-spaces  (  V spaces),  which  are 
similar  to  types  in  programming  languages.  The  term  Vspace  is  used  to  emphasize  that  data 
is  passed  by  value,  and  that  the  data  in  a  message  has  no  operations  that  can  be  performed 
on  it.  Abstract  Vspaces  are  user-defined  Vspaces;  their  representations  are  defined  in  terms 
of  other  abstract  and  built-in  Vspaces. 

Over  the  past  year,  algorithms  for  negotiating  representations  to  be  used  in  messages  have 
been  designed.  Current  work  involves  the  design  of  the  interface  description  language  (IDL) 
and  annotations  for  its  use  with  the  C  programming  language.  The  IDL  is  used  to  give  a 
language-independent  description  of  a  program  interface  in  terms  of  ports  and  Vspaces.  The 
language-specific  annotations  are  used  to  describe  how  a  given  program  uses  that  interface. 
For  example,  the  annotations  may  describe  mappings  between  types  in  the  program  and 
Vspaces  in  the  interface.  The  annotations  also  describe  interactions  between  the  program 
and  the  communications  substrate  that  implements  the  Mercury  protocols. 

As  part  of  defining  the  IDL,  we  studied  the  feasibility  of  adding  interface  types  to  Mercury 
along  with  a  mechanism  for  dynamically  checking  their  use.  Loosely  stated,  an  interface 
type  plays  a  similar  role  to  that  of  an  abstract  data  type  in  a  programming  language.  More 
specifically,  an  interface  is  a  collection  of  ports  (i.e.,  operations)  through  which  a  client  may 
access  a  subsystem.  A  subsystem  in  turn  consists  of  one  or  more  entities  that  together 
provide  a  service.  Each  port  in  the  interface  is  provided  by  a  single  entity;  invocation  of 
a  port  results  in  a  remote  procedure  call  to  the  corresponding  entity.  To  a  client,  then,  a 
subsystem  appears  as  a  (distributed)  object  that  responds  to  invocations  of  the  ports  in  its 
interface.  The  behavior  of  a  subsystem  is  determined  by  an  interface  to  which  we  assign  a 
type.  A  given  subsystem  corresponds  to  an  instance  of  a  type. 

Type  checking  the  use  of  interfaces  is  more  difficult  than  type  checking  the  use  of  abstract 
data  types  in  a  program.  The  greatest  difficulties  arise  because  subsystem  interfaces  are 
constructed  dynamically  and  because  subsystems  may  evolve.  As  an  example  of  the  first, 


101 


Large  Scale  Parallel  Systems 

consider  the  following  subsystem:  Entity  E a  constructs  an  instance  A  of  interface  I  a  con¬ 
taining  only  ports  that  it  provides  and  then  exports  this  instance  (e.g.,  by  putting  it  in  some 
catalog).  Entity  Eb  imports  A  (e.g.,  by  looking  it  up  in  the  catalog)  and  constructs  an 
instance  B  of  interface  Ib  by  combining  some  of  the  ports  of  A  with  ports  that  it  provides. 
It  then  exports  B.  The  subsystem,  then,  consists  of  entities  EA  and  Eb  and  its  interface  is 
described  by  Ib-  A  client  can  then  import  B  and  invoke  its  ports.  The  client  only  knows 
about  Ib,  and  from  its  point  of  view,  all  the  ports  in  B  belong  to  Ib,  including  those  imple¬ 
mented  by  Ea •  From  EA’s  point  of  view  its  ports  belong  to  I  a-  To  type-check  the  client’s 
use  of  a  port  from  EA,  we  need  to  know  the  relationship  between  I  a  and  Ib.  In  addition, 
we  would  like  to  detect  if  Eb  made  an  error  in  constructing  B.  This  requires  that  the  client 
and  Ea  exchange  information  at  runtime. 

To  illustrate  the  problem  of  evolution,  consider  replacing  EA  in  the  above  example  with  a 
new  version  E'A  providing  an  interface  I'A  that  is  compatible  with  I  a,  in  that  every  port  in 
I  a  is  also  in  I'A.  Since  clients  may  still  have  ports  exported  using  I  a,  E'a  needs  to  know  the 
relationship  between  I'A  and  IA ,  as  well  as  that  between  I'A  and  /#. 

Initial  research  indicates  that  it  may  be  feasible  to  provide  interface  types  and  type  checking. 
Doing  so  requires  that  the  programmer  describe  the  relationship  between  an  interface  ex¬ 
ported  by  an  entity  and  that  of  the  subsystem  to  which  the  entity  belongs.  This  relationship 
can  be  described  statically.  Furthermore,  it  imposes  an  order  on  the  replacement  of  entities 
in  a  subsystem.  Further  research  is  needed  to  determine  if  these  constraints  are  acceDtable. 

7.8  Transaction  Processing 

Transaction  systems  are  becoming  widely  used  in  database  systems,  office  automation  sys¬ 
tems,  and  distributed  systems.  Implementations  of  transaction  systems  are  large  and  com¬ 
plex,  and  involve  subtle  algorithms  that  interact  in  poorly  understood  ways.  Formal  tech¬ 
niques  can  be  very  useful  in  managing  this  complexity.  In  joint  work  with  Alan  Fekete, 
Nancy  Lynch,  and  Michael  Merritt,  William  Weihl  has  continued  the  development  of  a  formal 
model  for  transaction  systems  that  simplifies  the  description  and  verification  of  transaction¬ 
processing  algorithms.  The  model  is  quite  general,  allowing  a  wide  range  of  algorithms  to 
be  described.  In  the  last  year,  we  completed  the  description  and  verification  of  a  general 
locking  algorithm  [112].  We  also  generalized  proof  techniques  based  on  serialization  graphs, 
originally  developed  for  single-level  transaction  systems,  to  nested  transaction  systems  [113]. 
In  the  original  work  on  serialization  graphs,  recovery  was  essentially  ignored  by  considering 
only  executions  in  which  all  transactions  commit.  Our  work  clarifies  the  interactions  with 
recovery  by  making  explicit  the  assumptions  about  recovery  that  are  implicit  in  earlier  work. 

In  other  work,  William  Weihl  has  analyzed  the  interactions  between  concurrency  control 
and  recovery  in  transaction  systems  [289].  There  has  been  little  previous  theoretical  work 
on  recovery,  and  the  extensive  theoretical  literature  on  concurrency  control  ignores  recovery. 
However,  not  all  “correct”  recovery  algorithms  work  with  all  “correct”  concurrency  control 
algorithms.  We  have  developed  techniques  for  analyzing  the  interactions  between  concur¬ 
rency  control  and  recovery,  and  have  used  them  to  give  necessary  and  sufficient  conditions  for 
a  concurrency  control  algorithm  to  work  with  each  of  several  different  recovery  algorithms. 
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Our  analysis  of  the  interactions  between  concurrency  control  and  recovery  also  suggests  a 
useful  methodology  for  verifying  concurrency  control  and  recovery  algorithms  that  allows 
their  interactions  to  be  ignored  as  much  as  possible:  first,  give  a  verjr  abstract  description  of 
both  the  concurrency  control  and  the  recovery  algorithm.  This  description  can  be  used  to 
analyze  their  interactions  without  getting  overwhelmed  by  details  of  either  algorithm,  and 
to  prove  that  the  combination  of  the  two  algorithms  is  correct  in  the  sense  that  they  ensure 
that  transactions  are  atomic.  Second,  describe  each  algorithm  in  detail,  and  show  that  the 
detailed  algorithm  implements  the  more  abstract  algorithm.  In  this  second  step,  the  details 
of  concurrency  control  can  be  ignored  when  showing  that  the  detailed  recovery  algorithm  is 
correct,  and  vice-versa.  This  proof  methodology  seems  to  lead  to  much  simpler  proofs. 
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8.1  Mercury  System  Development 

Rob  Austein,  Karen  Sollins,  and  John  Wroclawski  continued  to  develop  the  Mercury  system. 
Recently,  they  implemented  a  second  generation  version  of  Mercury  for  C/Unix  programmers. 
This  system  combines  work  in  the  areas  of  RPC  semantics,  heterogenous  communication, 
and  binding  architectures  to  form  a  substrate  for  multi-architecture  distributed  computing. 
We  summarize  our  work  in  each  of  these  areas  separately  because  each  result  is  useful  on  its 
own. 

Programs  in  the  Mercury  system  communicate  using  pipelined  sequences  of  remote  procedure 
calls  known  as  call  streams.  Rob  Austein  and  John  Wroclawski  developed  and  implemented 
a  practiced  design  for  call  streams.  We  defined  the  behavior  of  call  streams  in  the  presence  of 
flow  control  and  limited  memory  resources.  We  specified  a  standardized  network  transport 
layer  model  called  the  Mercury  Virtual  Transport.  This  model  specifies  the  services  Mercury 
expects  from  the  underlying  network.  We  designed  a  protocol  which  implements  the  full 
semantics  of  call  streams  using  this  transport  model.  Use  of  the  virtual  transport  concept 
allows  us  to  specify  the  call  stream  protocol  in  a  network-independent  fashion.  We  have 
defined  a  mapping  protocol  which  implements  the  virtual  transport  over  traditional  byte 
stream  transports  such  as  TCP  or  ISO  TP4.  We  have  used  this  work  to  implement  call 
streams  over  both  Unix  intra- machine  and  IP/TCP  transport  protocols. 

Presentation  functions  allow  heterogeneous  systems  to  communicate  in  terms  of  typed  data 
objects.  Since  its  inception,  the  Mercury  project  has  explored  the  effects  of  introducing 
user-defined  or  abstract  types  into  the  presentation  layer.  This  year,  Karen  Sollins  and  John 
Wroclawski  have  investigated  presentation  architectures  which  support  evolvable  systems,  in 
which  individual  modules  may  be  enhanced  or  replaced  gracefully  without  requiring  changes 
in  other  modules.  We  propose  a  system  which  supports  abstract  presentation  types,  a  well 
defined  model  of  type  compatibility,  and  a  set  of  system-supported  implicit  type  conversion 
rules.  We  argue  that  this  allows  decentralized  systems  to  be  specified  and  constructed  in  a 
manner  which  is  type-safe  and  precisely  captures  the  user’s  intent,  while  supporting  flexible 
and  evolvable  interrelationships  between  modules.  We  are  currently  implementing  a  Mercury 
presentation  layer  based  on  these  principles. 

Karen  Sollins  and  John  Wroclawski  implemented  a  binding  architecture  which  supports  long- 
lived  modules,  which  may  crash  and  restart  without  loss  of  state;  and  mobile  modules,  which 
may  move  from  machine  to  machine  invisibly  to  the  client.  Mercury  remote  procedures  are 
accessed  through  ports ,  typed  transmissable  procedure-valued  objects  which  reference  pro¬ 
cedures  at  remotely  available  modules.  We  implemented  the  notion  of  ports  with  pre-bound 
arguments,  set  at  port  creation  rather  than  call  time.  A  principal  use  for  this  mechanism 
is  to  transparently  utilize  a  single  remote  procedure  as  the  “operation  handler”  for  many 
“objects”  by  associating  the  handler  procedure  with  a  number  of  ports,  each  containing  a 
different  “object”  as  a  pre-bound  argument.  Pre-binding  elegantly  supports  a  single  invoca¬ 
tion  mechanism  which  can  appear  either  procedure-oriented  or  object-oriented,  depending 
on  the  requirements  of  the  particular  problem. 

Rob  Austein  and  John  Wroclawski  developed  a  simple  model  for  exception  and  condition 
handling  which  replaces  the  ad  hoc  mechanisms  often  used  in  C  programs.  Our  model 
views  exceptions  as  belonging  to  a  hierarchically  organized  class  structure,  and  allows  the 
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programmer  to  dynamically  bind  an  exception  handler  to  a  class  of  exceptions.  Exception 
handlers  may  take  a  number  of  actions,  ultimately  either  restarting  the  computation  or 
unwinding  to  an  enclosing  stack  level.  We  implemented  this  mechanism  as  a  C  library, 
which  is  in  use  within  the  Mercury  project  and  has  been  distributed  to  several  sites  outside 
MIT. 

Rob  Austein  designed  and  implemented  a  lightweight  process  (threads)  package  for  Unix 
which  provides  features  missing  from  other  work,  including  preemptive  scheduling,  reasonable 
synchronization  primitives,  and  graceful  interaction  with  the  Unix  I/O  system. 

8.2  Mercury  Presentation  to  Open  Software  Foundation 

Barbara  Liskov,  Bill  Weihl,  and  John  Wroclawski  presented  the  Mercury  project  to  the  Open 
Software  Foundation  in  response  to  a  Request  for  Proposals  for  distributed  computing  tech¬ 
nology.  Our  presentation  comprised  an  overview  of  the  Mercury  system,  detailed  responses 
to  questions  considered  important  by  the  OSF  evaluation  team,  and  a  proposal  describing 
one  possible  integration  of  Mercury’s  ideas  into  a  larger  distributed  computing  toolbox. 

Barbara  Liskov  presented  our  submission  and  served  on  a  panel  at  the  initial  meeting  of  OSF 
member  organizations.  John  Wroclawski  attended  a  series  of  review  meetings,  discussing 
Mercury  with  the  OSF  team  and  other  interested  parties. 

Although  the  OSF  eventually  chose  to  base  their  efforts  entirely  on  commercially  available 
technology,  our  work  received  substantial  exposure  and  comment  as  a  result  of  this  presen¬ 
tation. 

8.3  Modular  Application  Environment  (MAE) 

Mike  Frumkin  and  John  Wroclawski  are  exploring  techniques  to  make  applications  accessible 
as  building  blocks  to  relatively  naive  users.  Our  concept  is  to  define  the  behavior  of  an 
application  in  an  abstract  logical  manner,  rather  than  as  a  specific  set  of  user  interface 
actions.  We  then  present  the  same  abstract  interface  directly  to  the  user  through  an  interface 
tool  of  the  user’s  choice,  and  to  other  applications  through  a  set  of  remote  procedures.  A 
global  naming  and  support  environment  allows  users  to  specify  inter-application  finks  in  an 
intuitive  fashion.  MAE  uses  Mercury  as  a  communications  substrate. 

This  work,  which  will  constitute  Mike  Frumkin’s  Master’s  thesis,  was  supported  by  an  RA 
from  the  Advanced  Network  Architecture  Group  during  the  spring  of  1990. 

8.4  Synchronized  Clock  Message  Protocol 

Barbara  Liskov,  Liuba  Shrira,  and  John  Wroclawski  developed  the  Synchronized  Clock  Mes¬ 
sage  Protocol,  a  new  message  passing  protocol  which  provides  guaranteed  detection  of  du¬ 
plicate  messages  even  when  the  receiver  has  no  state  stored  for  the  sender.  The  method  is 
based  on  the  assumption  that  clocks  throughout  the  system  are  loosely  synchronized.  Our 
work  shows  how  to  build  higher  level  protocols,  such  as  RPC,  which  provide  at-most-once  se¬ 
mantics  without  requiring  a  performance-lowering  connection  setup  step.  We  implemented 
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a  SCMP-based  RPC  protocol  within  the  widely  used  Sun  RPC  library,  and  showed  that 
SCMP-based  at-most-once  remote  procedure  calls  could  be  provided  at  the  same  cost  as  less 
desirable  RPC’s  that  do  not  guarantee  at-most-once  execution. 

Liuba  Shrira  presented  an  early  version  of  this  work  at  the  Second  IEEE  Workshop  on 
Workstation  Operating  Systems,  and  Barbara  Liskov  gave  a  talk  about  the  work  at  the  Uni¬ 
versity  of  Arizona.  A  later  version  has  been  accepted  for  presentation  at  the  1990  SIGCOMM 
conference. 
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9.1  Introduction 


Research  in  the  Programming  Methodology  Group  has  continued  to  focus  on  the  area  of 
distributed  computing.  We  are  working  on  a  replicated  Unix  file  system  for  use  via  the  NFS 
protocol  [259]  [276],  and  on  a  highly- available  object  repository  for  use  in  a  heterogeneous 
distributed  network.  In  addition,  we  continued  our  work  on  replication  methods  and  atomic 
transactions. 


9.2  Replicated  File  System 

Our  replicated  file  system  has  the  following  goals: 

1.  It  should  provide  the  same  semantics  as  an  unreplicated  NFS  server. 

2.  It  should  be  usable  with  whatever  NFS  client  code  exists  at  the  client  machine. 

3.  We  want  to  avoid  having  our  system  depend  on  proprietary  information.  Instead,  our 
code  is  sandwiched  between  NFS  and  the  Unix  file  system  kernel.  It  is  called  by  the 
NFS  code  at  the  server,  and  in  turn  makes  calls  on  low  level  file  system  operations. 

4.  We  want  to  continue  to  provide  service  even  when  one  replica  is  crashed  or  inaccessible, 
but  have  only  two  copies  of  each  file. 

5.  We  want  to  achieve  reliability  at  least  as  good  as  a  single  server. 

6.  We  want  to  achieve  performance  as  least  as  good  as  a  single  server.  In  particular,  the 
delay  observed  by  the  client  in  doing  a  read  or  write  should  be  no  greater  with  our 
service  than  with  a  single  server. 

We  plan  to  use  a  primary  copy  method  as  our  replication  technique.  Our  method  is  based 
on  our  earlier  work  on  primary  copy  methods  in  transaction  systems  [236],  but  we  adapted 
this  approach  to  match  the  needs  of  this  application.  Most  importantly,  we  take  advantage 
of  the  fact  that  we  need  not  support  general  atomic  transactions.  Instead,  each  individual 
file  system  operation  must  run  atomically,  but  support  for  combining  operations  into  multi¬ 
operation  transactions  is  not  needed. 

Each  file  system  is  assigned  to  a  pair  of  servers;  one  is  the  primary  and  the  other  is  the 
backup.  The  roles  assigned  to  different  servers  can  change  when  there  are  failures,  and  at 
this  point,  the  third  server  will  be  involved;  failures  are  discussed  further  below.  Different 
file  systems  can  be  assigned  to  different  pairs;  in  this  way  we  spread  the  load  among  the 
servers. 

In  a  primary  copy  method,  client  requests  are  sent  to  the  primary,  which  decides  what  to 
do  and  communicates  with  the  backup  as  needed.  Running  single  operation  transactions 
requires  a  two  phase  protocol.  In  phase  1,  the  primary  informs  the  backups  about  the 
operation.  When  the  backups  acknowledge  receipt  of  this  information,  the  operation  can 
commit.  At  that  point  the  primary  can  return  information  to  the  client;  the  backups  are 
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informed  about  the  commit  later  in  the  background  (this  is  phase  2).  Phase  1  information 
must  reach  a  sufficient  number  of  backups  so  that  we  can  guarantee  that  the  information 
survives  subsequent  failures;  in  a  system  like  ours  that  is  intended  to  survive  a  single  node 
failure,  just  one  backup  is  needed. 

The  primary  maintains  a  log  in  volatile  memory  in  which  it  records  information  about 
client  operations.  The  log  is  simply  a  sequence  of  entries;  later  entries  represent  more 
recent  operations.  Typically  some  entries  are  for  operations  in  phase  1,  while  others  are  for 
operations  that  have  committed.  The  primary  distinguishes  between  these  by  maintaining 
the  CP  (the  commit  point);  this  is  the  index  of  the  latest  committed  entry.  Operations 
commit  in  entry  order. 

Operations  that  do  not  involve  modifications  to  the  file  system  are  done  entirely  by  the 
primary;  no  log  entries  are  created  for  them  and  no  communication  with  the  backup  is 
needed.  (We  discuss  how  we  guarantee  proper  serialization  for  such  operations  later  in  this 
section.)  For  a  modification  operation,  the  primary  sends  the  logged  information  to  the 
backup.  When  the  **ack  arrives,  the  primary  advances  its  CP  (backups  do  acks  in  log  entry 
order).  The  primary  can  work  on  many  user  requests  in  parallel;  an  operation  must  be 
delayed  only  if  it  conflicts  with  an  earlier,  uncommitted  operation. 

Operations  are  not  applied  to  the  primary’s  file  system  until  after  they  commit,  and  this 
writing  occurs  in  the  background.  Each  server  is  equipped  with  an  uninterruptible  power 
supply  (UPS),  so  that  it  will  be  able  to  write  its  log  to  disk  in  the  case  of  a  power  failure. 

The  backup  records  information  received  from  the  primary  in  its  volatile  log.  The  primary 
also  informs  it  about  the  current  CP  in  each  message,  and  the  backup  records  this  information 
in  its  CP.  Like  the  primary,  the  backup  moves  committed  information  to  its  file  system  in 
the  background. 

The  information  in  an  entry  includes  more  than  just  the  arguments  sent  by  the  client.  For 
example,  the  primary  will  choose  the  time  at  which  a  write  operation  is  to  occur,  and  log 
this  information.  By  logging  sufficient  information,  we  can  insure  that  the  effect  of  applying 
a  client  operation  to  the  file  system  is  identical  at  both  the  primary  and  the  backup.  In 
addition,  we  can  guarantee  that  operations  are  idempotent:  even  if  an  operation  is  performed 
a  second  time  (which  can  happen  when  there  is  a  failure),  the  effect  is  the  same  as  if  it 
happened  just  once. 

Information  is  removed  from  the  log  when  it  is  known  to  be  recorded  in  the  file  systems  at 
both  the  primary  and  backup.  Each  server  maintains  a  counter  called  the  AP  (the  application 
point);  all  entries  with  index  less  than  or  equal  to  the  AP  have  been  applied  to  the  file  system. 
Servers  send  their  APs  in  messages;  a  log  entry  can  be  discarded  when  it  is  known  to  be 
earlier  than  both  APs. 

As  mentioned,  each  file  system  is  the  responsibility  of  a  pair  of  workers;  the  third  server 
acts  as  a  “witness”  for  that  system  [2 1 3]  [243] .  If  one  of  the  workers  becomes  inaccessible, 
the  other  worker  and  the  witness  carry  out  a  view  change  [103]  [102].  The  remaining  worker 
will  be  the  primary  of  the  new  view  and  the  witness  will  be  the  backup.  Such  a  “promoted” 
witness  keeps  a  log  just  like  a  regular  backup,  but  it  does  not  have  a  copy  of  the  file  system, 
so  it  does  not  apply  committed  requests  to  the  file  system.  Instead,  it  keeps  the  earlier 
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entries  in  the  log  on  disk.  (The  worker  in  a  view  with  a  promoted  witness  can  discard  entries 
in  its  log  as  soon  as  they  have  been  recorded  in  its  file  system,  and  as  soon  as  that  part  of 
the  witness’  log  is  on  the  witness’  disk.)  We  are  not  yet  certain  what  we  will  do  if  the  view 
lasts  long  enough  that  the  promoted  witness’  log  becomes  too  big  to  store.  One  possibility 
is  to  keep  most  of  the  witness’  log  on  tape. 

The  view  change  algorithm  guarantees  that  any  modification  operation  that  completed  (i.e., 
that  returned  to  the  client)  will  be  recorded  in  the  system  state.  In  addition,  operations  that 
did  not  return  may  also  survive  into  the  new  state;  these  are  operations  that  made  it  to  a 
backup,  but  where  the  primary  of  the  previous  view  had  not  yet  notified  the  client. 

As  mentioned,  read  operations  are  done  locally  at  the  primary.  This  can  lead  to  a  serialization 
problem  if  a  new  view  has  formed  but  the  old  primary  does  not  know  about  it.  In  that  case, 
a  write  operation  that  committed  in  the  new  view  may  not  be  reflected  in  the  result  of  the 
read  returned  to  the  client,  even  though  the  client  may  know  that  the  write  has  happened. 
To  avoid  this  serialization  problem,  we  make  use  of  loosely  synchronized  clocks  [226]  to  define 
“time  windows”  during  which  a  view  change  will  now  happen.  Each  message  sent  by  the 
backup  to  the  primary  contains  a  time  equal  to  its  clock’s  time  +  6 ;  here  8  is  on  the  order 
of  a  few  seconds.  This  time  represents  a  promise  by  the  backup  not  to  start  a  new  view 
until  that  time  has  passed.  The  primary  needs  to  communicate  with  the  backup  about  a 
read  operation  only  if  the  time  of  its  local  clock  is  greater  than  the  promised  time  -e,  where 
e  is  the  clock  skew.  When  a  new  view  starts,  it  must  be  delayed  until  the  time  of  the  new 
primary’s  clock  is  greater  than  the  promised  time  of  the  backup.  In  this  way,  we  guarantee 
that  there  cannot  be  a  write  that  committed  in  the  new  view  and  that  should  occur  before 
a  read  in  an  earlier  view. 


9.3  Object  Repository 

The  object  repository  has  two  goals: 

1.  It  is  intended  to  provide  a  convenient  medium  for  sharing  of  information  among  pro¬ 
grams  written  in  many  different  programming  languages. 

2.  In  addition,  it  -will  provide  support  for  the  construction  and  execution  of  distributed 
programs.  Components  of  these  programs  can  be  implemented  in  different  program¬ 
ming  languages;  communication  will  occur  through  the  repository. 

We  identified  requirements  for  the  system  in  support  of  these  goals.  For  the  first  goal,  we 
believe  the  following  requirements  are  important: 

1.  The  repository  should  store  information  as  objects,  and  objects  should  be  able  to  refer 
to  one  another. 

2.  Each  object  can  be  accessed  only  by  calling  operations  of  its  type.  Mechanisms  that 
allow  users  to  define  new  types  must  be  provided. 
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3.  All  accesses  to  objects  happen  within  atomic  transactions,  so  that  the  repository  can 
maintain  multi-object  consistency  constraints. 

4.  The  repository  must  provide  database  functionality.  In  particular,  fast  access  to  large 
collections  of  objects  is  needed. 

5.  The  repository  should  provide  reliable  persistent  storage:  with  high  probability,  it  will 
guarantee  to  preserve  information  entrusted  to  it. 

6.  The  repository  should  provide  highly  available  storage  so  that  whenever  a  client  needs 
to  access  an  object,  it  will  be  able  to  do  so  with  high  probability. 

7.  The  repository  must  be  scalable  in  many  dimensions:  object  sizes,  number  of  ob¬ 
jects,  number  of  clients,  size  of  network  (e.g.,  local  area  network  vs.  geographically 
distributed). 

8.  The  repository  must  provide  a  security  and  protection  mechanism. 

In  addition,  the  repository  must  perform  well,  but  it  is  unclear  at  this  point  what  kind  of 
performance  can  be  expected  from  such  a  system. 

These  requirements  also  apply  to  the  second  goal  (with  the  possible  exception  of  database 
functionality).  In  addition,  to  support  this  goal  we  need  to  provide  a  way  for  clients  to 
communicate  via  the  repository.  We  have  in  mind  here  some  way  for  a  client  to  “post”  a 
request  for  service  so  that  servers  for  that  service  can  find  out  about  the  request  in  a  timely 
manner.  We  note  that  this  model  of  computation  differs  from  the  more  conventional  one 
of  remote  procedure  calls  (it  bears  some  similarity  to  the  Linda  model  [116]).  It  offers  two 
advantages  over  RPCs:  reliable  communication  in  spite  of  client  and  server  failures,  and 
good  support  for  allowing  multiple  clients  to  communicate  with  multiple  servers. 

In  the  remainder  of  this  section  we  describe  the  semantics  of  the  object  universe  provided 
by  the  repository.  The  universe  contains  objects  that  clients  can  share.  Each  object  has 
a  unique  name  (an  object  identifier,  or  oid)  and  a  value.  Clients  can  identify  objects  by 
providing  their  oids,  and  objects  can  refer  to  one  another  using  oids. 

An  object  also  has  a  type  that  determines  the  set  of  operations  that  can  be  applied  to  it. 
The  repository  guarantees  that  objects  are  accessed  only  by  means  of  the  operations  of  their 
type.  Thus,  these  operations  are  the  only  way  that  the  objects’  values  can  be  observed  or 
modified. 

The  repository  provides  a  rich  set  of  builtin  types  (e.g.,  integers,  booleans,  characters)  and 
constructors  (e.g.,  arrays,  records,  unions).  Sets  will  be  provided  as  a  builtin  constructor, 
and  clients  can  cause  indexes  to  be  provided  for  sets,  to  speed  up  queries.  (We  may  extend 
this  ability  to  “set-like”  constructors.)  In  addition,  users  can  define  new  abstract  types  for 
the  repository.  We  are  currently  working  on  a  method  to  allow  efficient  maintenance  of 
indexes  on  sets  where  the  objects  in  the  set  are  of  abstract  type. 

Clients  of  the  repository  interact  with  it  by  invoking  operations  on  its  objects.  These  calls 
(from  clients  to  the  repository)  follow  call-by-value  semantics.  Arguments  and  results  are 
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usually  oids.  For  example,  a  call  to  add  an  employee  to  a  set  of  employees  would  take  the 
oid  of  the  set  and  the  oid  of  the  employee  as  arguments. 

However,  sometimes  actual  values  of  objects  are  needed.  For  example,  suppose  one  wanted 
to  add  the  number  ten  to  an  array  of  integers.  While  ten  is  conceptually  an  object  in  the 
repository,  it  hardly  makes  sense  to  require  a  client  to  know  what  ten’s  oid  is.  Instead, 
along  with  oids,  it  must  be  possible  to  use  values  as  arguments  and  results.  For  a  given  type 
T  in  the  repository,  T  can  have  an  external  representation,  ext(T);  this  is  a  description  of 
the  format  of  the  data  that  a  client  will  receive  (when  reading  the  value)  or  provide  (when 
specifying  a  value).  The  external  representation  is  similar  to  a  message  representation  used 
for  communication  in  distributed  system  [145] [189].  It  differs  from  the  way  the  object  is 
represented  within  the  repository,  and  also  from  the  way  the  value  will  be  represented  in  the 
client  program.  Typically,  translations  are  provided  on  both  ends:  the  internal  representation 
is  encoded  to  produce  the  external  representation  by  the  provider  of  the  value,  and  the 
external  representation  is  decoded  to  produce  the  internal  representation  used  by  the  receiver. 

Within  the  repository,  if  the  definer  of  type  T  provides  an  encode  routine  from  T  to  ext(T), 
then  it  is  legal  to  return  result  values  of  type  T  to  client  programs.  If  the  definer  provides 
a  decode  routine  from  ext(T)  to  T,  then  it  is  legal  to  accept  argument  values  of  type  T 
from  client  programs.  The  simple  builtin  types  (integers,  reals,  characters,  etc.)  will  provide 
encode  and  decode  routines;  thus,  like  most  object  repositories  that  use  an  abstract  type 
approach,  simple  values  can  be  used  as  arguments  and  results.  Unlike  most  other  object 
repositories  (e.g.,  Encore  [268]),  values  of  user-defined  type  can  also  be  arguments  or  results, 
provided  the  type  has  the  encode  or  decode  routine. 

The  types  of  objects  in  the  repository  are  language-independent  and  a  way  is  needed  for 
describing  them  that  is  independent  of  any  programming  language  (such  as  the  one  used  to 
implement  them).  This  can  be  accomplished  by  providing  a  “type  description”  that  defines 
the  names  and  signatures  for  the  type’s  operations  and  also  the  external  representation  if 
the  type  is  to  be  transmitted  as  a  value.  The  signatures  will  be  defined  in  terms  of  other 
types  known  to  the  repository.  In  addition,  the  definer  needs  to  give  a  specification  so 
that  programmers  who  wish  to  use  objects  of  the  new  type  can  understand  its  meaning. 
The  important  point  to  notice  here  is  that  all  of  this  can  be  done  without  defining  an 
implementation  for  the  new  type,  and  the  information  is  independent  of  the  programming 
language  that  will  be  used  to  implement  the  type.  Thus,  abstract  data  types  provide  a 
means  for  programs  in  different  languages  to  communicate. 

To  provide  good  performance,  we  will  allow  clients  to  invoke  several  operations  in  one  call; 
this  facility  is  needed,  for  example,  to  run  queries  efficiently.  We  have  not  yet  decided  on 
how  powerful  the  language  for  defining  such  “combined  operations”  will  be  (e.g.,  whether  it 
will  just  provide  expressions,  or  whether  general  programs  can  be  written).  All  operation 
calls  must  take  place  within  atomic  transactions;  a  transaction  can  contain  one  or  more  calls. 

In  addition  to  calls,  we  will  also  provide  a  means  for  clients  to  post  r»*qiiects  for  service, 
and  to  request  notification  when  such  requests  arrive.  We  have  not  yet  decided  how  this 
mechanism  will  work. 
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9.4  Disconnected  Actions 

Boaz  Ben-Zvi  has  completed  his  Master’s  thesis  on  disconnected  actions.  This  work  assumes 
the  nested  transaction  model  supported  by  Argus  [200].  A  topaction  is  a  transaction  that 
has  no  parent;  when  it  commits,  its  results  become  permanent.  An  action  is  allowed  to 
create  subactions ,  whose  commits  are  relative  to  the  parent:  when  a  subaction  commits,  its 
effects  become  visible  to  the  parent,  but  if  the  parent  aborts  later,  this  undoes  the  effects  of 
the  subaction.  In  Argus,  a  parent  action  stops  running  when  it  creates  a  subaction,  and  all 
subactions  must  terminate  before  it  is  allowed  to  continue  running. 

Disconnected  actions  are  subactions  that  are  allowed  to  run  in  parallel  with  their  ancestors; 
the  only  constraint  is  that  all  disconnected  actions  must  terminate  before  their  topaction 
ancestor  is  permitted  to  commit.  They  are  useful  to  allow  lazy  propagation  of  information. 
For  example,  consider  a  replicated  system  in  which  a  read  or  write  must  be  done  to  three 
out  of  the  five  replicas.  In  such  a  system,  writing  to  more  than  three  replicas  can  improve 
the  performance  of  later  reads  because  the  needed  information  may  reside  at  replicas  that 
are  closer  to  the  reader.  However,  the  transaction  doing  the  write  should  not  need  to  be 
delayed  while  the  writing  to  the  additional  replicas  occurs.  Disconnected  actions  can  be  used 
to  allow  the  extra  writing  to  go  on  in  the  background,  while  the  parent  continues  to  do  other 
work. 

Ben-Zvi  worked  out  the  locking  and  commit  rules  that  are  needed  to  support  disconnected 
actions.  He  investigated  several  different  approaches.  For  example,  the  commit  of  a  topaction 
can  be  made  conditional  on  some  number  of  a  set  of  disconnected  action  descendants  com¬ 
mitting;  if  at  least  that  many  have  committed  when  the  topaction  tries  to  commit,  the 
topaction  commits  immediately  and  any  of  the  disconnected  actions  that  have  not  termi¬ 
nated  are  aborted. 

9.5  Viewstamp  Replication  in  Argus 

Sanjay  Ghemawat  [117]  implemented  Oki’s  viewstamp  replication  mechanism  [235]  [236]  and 
measured  the  performance  of  the  implementation.  His  work  took  place  within  the  Argus 
system  [198]  [200].  Argus  guardians  are  resilient  to  node  failures  because  their  state  variables 
are  maintained  on  stable  storage  [184].  Having  resilient  guardians  supports  the  construction 
of  highly  reliable  systems  that,  with  high  probability,  do  not  lose  information  entrusted  to 
them.  However,  it  does  not  support  high  availability:  if  a  guardian’s  node  is  crashed  or 
inaccessible  because  of  a  network  failure,  clients  will  be  unable  to  use  the  guardian. 

Ghemawat  implemented  a  version  of  Argus  in  which  each  guardian  is  implemented  as  three 
replicas.  One  of  the  replicas  is  the  primary;  the  others  are  backups.  All  operation  calls  are 
performed  at  the  primary.  Whenever  a  guardian  would  write  some  information  to  stable 
storage  in  the  original  Argus  implementation  (e.g.,  as  part  of  committing  a  transaction), 
the  primary  sends  that  information  to  the  backups.  In  case  of  a  failure  of  the  primary,  the 
backups  perform  a  view  change  [102][103],  and  one  of  them  becomes  the  new  primary.  In 
this  way,  the  guardian  as  a  whole  is  highly  available;  its  state  is  also  highly  reliable  provided 
each  replica  has  an  uninterruptible  power  supply  (UPS)  that  permits  it  to  write  volatile 
information  to  disk  in  the  event  of  a  power  failure. 
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Figure  9.1:  The  time  (in  milliseconds)  required  to  run  a  topaction  consisting  of  one  handler 
call.  Original  refers  to  the  unreplicated  system,  a,  b  refers  to  a  replicated  system  with  a 
replicas  of  the  client  and  b  replicas  of  the  server. _ 

Ghemawat  measured  the  performance  of  his  system  and  compared  it  with  that  of  Argus. 
For  example,  Figure  9.1  shows  the  performance  of  topaction  that  performed  a  single  handler 
call  that  either  observed  the  state  or  modified  it.  The  figure  shows  that  in  the  case  of 
modifications,  the  new  system  performs  substantially  better  than  Argus;  this  is  because  in 
Argus  the  new  information  must  be  written  to  disk,  which  takes  longer  than  the  roundtrip 
message  required  in  the  new  system.  In  fact,  the  situation  in  Argus  is  actually  worse  than  it 
appears,  since  Argus  implements  stable  storage  with  a  single  disk  instead  of  two  disks  with 
synchronous  writes  (which  is  what  is  really  needed)  [184].  On  the  other  hand,  the  new  system 
degrades  the  performance  of  reads  because  our  implementation  requires  a  roundtrip  message 
delay  in  a  case  where  no  disk  write  is  done  in  Argus.  (The  “time  window”  optimization  for 
the  replicated  file  system  discussed  in  Section  9.2  avoids  this  delay;  with  this  optimization, 
reads  should  perform  the  same  in  the  replicated  system  as  in  Argus.) 

Since  the  replication  scheme  requires  all  replicas  to  have  disks  and  UPSs,  it  is  probably 
not  appropriate  for  use  with  all  Argus  guardians.  Instead,  it  should  be  used  for  important 
services,  such  as  the  object  repository.  In  addition,  we  believe  that  the  replication  technique 
would  work  well  in  a  stable  storage  service,  which  provides  highly  available  and  reliable 
storage  for  guardians  as  a  service  in  a  network.  We  are  investigating  such  an  approach  and 
intend  to  implement  the  service  and  compare  its  performance  with  other  techniques  [85]  [77]. 

9.6  Orphan  Detection 

Steve  Markowitz  [221]  completed  his  Master’s  thesis  in  which  he  implemented  the  “map 
server”  scheme  for  doing  orphan  detection  in  Argus  and  compared  its  performance  with  the 
“deadline”  technique. 

An  orphan  is  a  computation  whose  results  are  no  longer  required.  Orphans  are  undesir¬ 
able  because  they  waste  system  resources  and  because  they  sometimes  observe  inconsistent 
information.  Therefore  Argus  provides  a  method  of  detecting  orphans  so  that  they  can  be 
destroyed.  Our  method  requires  sending  orphan- detection  information  in  almost  all  messages 
[202].  Since  the  information  grows  without  bound,  the  technique  is  only  feasible  if  it  can 
be  optimized  in  a  way  that  keeps  the  size  of  messages  and  the  size  of  the  orphan  detection 
information  small. 

We  developed  two  techniques  for  optimizing  our  basic  method.  The  deadline  strategy  does 
this  by  limiting  the  lifetimes  of  certain  entities  (such  as  topactions),  which  in  turn  limits 
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the  time  that  orphan  information  must  be  retained  [288].  The  map  server  strategy  keeps 
the  information  in  a  highly  available  central  service;  the  service  is  able  to  do  garbage  collec¬ 
tion  and  thus  bound  the  information  size,  and  furthermore,  it  associates  small  timestamps 
with  system  states,  and  these  timestamps  are  sent  in  messages  instead  of  the  information 
they  identify  [202].  The  deadline  strategy  was  implemented  and  analyzed  by  Nguyen  [229]; 
Markowitz  has  now  done  the  same  thing  for  the  map-server  technique.  He  discovered  that 
although  both  techniques  work  well  in  small  systems,  the  deadline  scheme  appears  to  work 
better  in  large  systems  because  it  exhibits  better  locality. 


9.7  Lazy  Replication 

Rivka  Ladin,  Barbara  Liskov,  and  Liuba  Shrira  have  continued  to  work  on  the  replication 
method  developed  by  Ladin  in  her  Ph.D.  thesis  [181].  A  paper  on  this  work  [182]  will  appear 
in  the  Proceedings  of  the  Ninth  ACM  Symposium  on  Principles  of  Distributed  Computing.  In 
addition,  Liskov  and  Sanjay  Ghemewat  constructed  a  simple  implementation  of  the  method 
for  a  particular  application,  with  the  goal  of  determining  the  cost  of  the  replication  tech¬ 
nique.  To  determine  this,  we  are  carrying  out  experiments  that  compare  the  performance 
of  the  replicated  service  with  an  unreplicated  one  for  the  same  application.  Preliminary 
results  indicate  that  the  replicated  service  provides  response  times  comparable  to  those  of 
the  unreplicated  one;  we  are  working  on  a  study  to  determine  and  compare  the  capacities  of 
the  two  systems. 

9.8  Avoiding  Recursion  Deadlock 

Eric  Brewer  and  Carl  Waldspurger  explored  issues  of  locking  and  serialization  in  concurrent 
object-oriented  programming  languages.  Their  work  focused  on  the  problem  of  recursion 
deadlock  in  current  actor  systems  [214] [297].  The  restrictions  on  recursion  in  these  systems 
hinder  the  use  of  abstract  modules,  and  force  programmers  to  rethink  algorithms  to  avoid 
recursion.  Brewer  and  Waldspurger  developed  two  mechanisms  for  solving  this  problem:  a 
novel  technique  using  multi-ported  actors ,  and  a  named  threads  scheme  that  borrows  from 
previous  work  in  distributed  computing. 
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10.1  Introduction 

The  Programming  Systems  Research  Group  works  in  two  areas:  programming  language  tech¬ 
nology  for  parallel  systems,  and  distributed  database  technology  for  large  scale  information 
systems. 

10.2  Community  Information 

The  Boston  Community  Information  System  is  a  large  scale  information  system  in  use  at 
over  150  sites  in  the  Boston  area.  It  provides  New  York  Times  and  Associated  Press  news 
wires  to  users  via  digital  broadcast  (to  PCs),  remote  procedure  calls,  and  electronic  mail.  We 
published  a  description  of  our  system  in  Communications  of  the  ACM  (see  list  of  publications 
below). 

Since  last  year,  we  doubled  the  number  of  users  served  by  the  electronic  mail  component 
of  the  system  to  over  100.  We  conducted  a  survey  of  the  electronic  mail  users  as  well  as 
compiled  quantitative  data  for  a  forthcoming  report. 

10.3  FX  Programming 

The  FX  programming  language  incorporates  a  fundamental  compiler  technology  for  par¬ 
allel  computers  that  can  be  used  by  a  range  of  existing  programming  languages  without 
modification.  This  compiler  technology  is  based  on  a  new  formal  technique  for  statically 
scheduling  expressions  for  parallel  execution.  We  are  using  implementation  experience  and 
experimentation  to  guide  the  development  of  the  technology. 

10.3.1  Approach 

Our  approach  to  parallel  computing  seeks  to  develop  a  new  scientific  basis  for  parallel  pro¬ 
gram  execution  that  retains  the  simple  semantics  present  in  sequential  programming  lan¬ 
guages.  Our  approach  is  unlike  other  approaches  to  parallel  computing  because  it  does  not 
require  primitives  for  explicit  parallelism  (although  it  can  accommodate  them)  and  because 
it  does  not  forbid  side  effects  in  programs. 

The  foundation  of  our  work  is  an  effect  system  that  permits  us  to  statically  determine  ex¬ 
pression  scheduling  constraints  and  thus  decompose  a  program  at  compile  time  for  parallel 
execution.  Just  as  a  type  system  describes  what  each  expression  in  a  program  computes,  an 
effect  system  describes  how  each  expression  computes.  For  example,  an  effect  system  can  de¬ 
termine  expression  side  effects  (read,  write,  and  initialize  regions  of  memory),  control  effects 
(representing  control  transfers),  and  communication  effects  (between  cooperating  processes). 

Our  experimental  results  with  a  prototype  compiler  suggest  that  effects  are  a  ireful  way  of 
discovering  and  exploiting  parallelism  in  complex  programs  without  burdening  the  program¬ 
mer.  Our  experimental  methodology  is  based  upon  testing  our  technology  on  programmers 
outside  of  our  own  research  group  by  an  iterative  cycle  of  language  design,  implementation, 
and  test. 
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10.3.2  Recent  Accomplishments 

We  completed  experiments  using  our  parallel  compiler  for  FX-87  that  show  between  a  factor 
of  two  and  five  speedup  under  ideal  conditions.  New  systems  for  type  inference,  effect 
inference  and  first  class  modules  have  been  implemented  in  the  context  of  FX-90,  and  our 
implementations  are  being  used  outside  of  our  research  group. 

In  addition,  we  completed: 

•  a  dialect  of  FX,  in  use  at  other  research  sites  and  in  an  MIT  graduate  course.  Based 
upon  user  feedback,  the  language  has  been  updated  for  easier  use. 

•  a  prototype  FX-90  implementation  incorporating  type  and  effect  inference:  no  decla¬ 
rations  are  necessary  to  get  the  benefit  of  type  and  effect  analysis. 

•  an  extension  of  our  earlier  system  of  first  class  modules  and  static  dependent  types. 
Effects  enable  a  new  approach  to  type  systems  that  permits  modules  to  be  treated  as 
run  time  values,  a  first  for  statically  typed  languages.  Modules  can  be  dynamically 
composed  to  construct  new  systems.  This  system  has  been  implemented  as  part  of  the 
prototype  FX-90  implementation. 

10.3.3  Plans  for  FY91 

In  the  following  year,  we  intend  to: 

•  complete  FX  technology  experiments  under  real  conditions  on  the  Encore  Multimax 
and  Connection  Machine  and  publish  the  experimental  results;  and 

•  encourage  technology  transfer  by  publishing  the  complete  documentation  of  FX  and 
making  the  implementation  widely  available. 

10.3.4  Technology  Transfer 

Outgoing  technology  transfer  activity  is  split  into  two  parts.  First,  through  our  scientific 
publications  and  experimental  results,  we  seek  to  influence  other  groups  to  understand  and 
use  the  technology  we  develop.  Second,  we  support  the  transfer  of  FX  implementations  to 
interested  users.  To  date,  we  sent  the  implementations  of  FX  to  approximately  three  other 
university  research  groups,  and  we  have  received  feedback  from  these  users. 

In  addition,  we  are  also  working  on  transferring  our  distributed  database  technology,  de¬ 
veloped  under  a  previous  DARPA  contract,  to  outside  firms  for  commercialization.  This 
technology  is  used  at  over  100  sites  world  wide. 

Incoming  technology  transfer  in  the  programming  language  area  is  facilitated  by  our  constant 
interaction  with  other  DARPA  contractors,  including  Stanford,  Yale,  and  CMU. 

10.3.5  Other  Information 

Pierre  Jouvelot  has  joined  our  research  group  as  a  visiting  scientist,  and  is  working  on 
adapting  FX  technology  to  the  Connection  Machine.  We  hosted  many  visitors,  including 
visitors  from  industry,  over  the  past  year  and  discussed  our  ongoing  research.  In  September 
1990  David  Gifford  was  awarded  the  Karl  Van  Tassel  Career  Development  Professorship. 
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11.1  Introduction 

Spoken  language  input  to  computers  is  a  major  goal  in  our  research  in  developing  a  graceful 
human-machine  interface.  Despite  some  recent  successful  demonstrations  of  speech  recogni¬ 
tion  capabilities,  current  systems  typically  fall  far  short  of  human  capabilities  of  continuous 
speech  recognition  with  essentially  unrestricted  vocabulary  and  speakers,  under  difficult 
acoustic  environments.  Our  approach  to  this  problem  is  to  seek  a  good  understanding  of 
human  communication  through  spoken  language,  to  capture  the  essential  features  of  the 
process  in  appropriate  models,  and  to  develop  the  necessary  computational  framework  to 
make  use  of  these  models  for  machine  understanding. 

It  is  our  belief  that  the  development  of  advanced  human/machine  communication  systems 
will  require  expertise  in  signal  processing,  system  theory,  pattern  recognition,  and  computer 
science,  built  on  a  solid  understanding  of  speech  science  and  linguistics.  We  place  heavy 
emphasis  on  designing  systems  that  can  make  use  of  the  knowledge  gained  over  the  past 
four  decades  in  human  communication,  with  the  hope  that  such  systems  will  one  day  have  a 
performance  approaching  that  of  humans.  Specifically,  our  approach  is  based  on  the  following 
premises: 

•  The  speech  signed  contains  information  regarding  the  intended  linguistic  message.  It 
also  contains  information  on  the  acoustic  environment  and  the  identity  and  physiolog¬ 
ical/psychological  states  of  the  speaker.  As  far  as  speech  recognition  is  concerned,  the 
latter  sources  of  information  can  be  considered  as  undesirable  noise.  Robust  speech 
recognition  is  critically  tied  to  our  ability  to  successfully  extract  the  linguistic  infor¬ 
mation  and  discard  those  aspects  that  are  extra- linguistic. 

•  Past  research  in  spoken  language  communication  has  established  phonemes  as  psycho¬ 
logically  real  units  for  representing  words  in  the  lexicon.  Therefore,  phonemes  and 
other  equivalent  descriptors,  such  as  distinctive  features  and  syllables,  are  the  most 
appropriate  units  to  relate  words  to  the  speech  signal  for  machine  recognition  as  well. 

•  While  phonemes  are  discrete  abstract  linguistic  entities,  their  acoustic  realizations  in 
speech  are  inherently  continuous,  reflecting  the  movement  of  the  articulators  from  one 
position  to  the  next.  Many  of  the  acoustic  cues  for  phonetic  contrasts  are  encoded  at 
specific  times  in  the  speech  signal.  In  order  to  fully  utilize  these  acoustic  attributes, 
we  believe  that  one  must  explicitly  establish  acoustic  landmarks  in  the  signal. 

•  Previous  attempts  at  explicit  utilization  of  speech  knowledge  have  resulted  in  the 
development  of  systems  that  are  based  on  heuristic  rules.  Such  efforts  typically  require 
intense  knowledge  engineering,  and  as  such  are  often  hampered  by  the  lack  of  a  unified 
control  strategy.  As  a  result,  system  development  is  slow,  and  the  performance  fragile. 
In  contrast,  we  seek  to  make  use  of  the  available  speech  knowledge  by  embedding  such 
knowledge  in  a  formal  framework  whereby  powerful  mathematical  tools  can  be  utilized 
to  optimize  its  use. 

•  Despite  significant  advances  made  in  phonetics,  phonology,  and  other  aspects  of  linguis¬ 
tics  over  the  past  decades,  we  still  lack  a  complete  understanding  of  the  human  speech 
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communication  process.  To  deal  with  our  present  state  of  ignorance  and  the  inherent 
variability  that  exists  throughout  the  process,  the  speech  recognition  system  must  have 
a  stochastic  component.  However,  it  is  our  belief  that  speech-specific  knowledge  will 
enable  us  to  build  more  sophisticated  stochastic  models  than  what  is  currently  being 
attempted,  and  to  reduce  the  amount  of  training  data  necessary  for  high  performance. 

•  The  ultimate  goal  of  our  research  is  the  understanding  of  the  spoken  message,  and  the 
subsequent  accomplishment  of  a  task  based  on  this  understanding.  To  achieve  this 
goal,  we  must  fully  integrate  the  speech  recognition  part  of  the  problem  with  natural 
language  processing  so  that  higher  level  linguistic  and  pragmatic  constraints  can  be 
utilized. 

•  The  development  of  a  spoken  language  understanding  system  will  require  interactions 
with  several  disciplines  in  computer  science.  Parallel  computing  will  be  necessary  for 
real  time  processing.  Efficient  algorithms  can  greatly  reduce  the  search  space  for  the 
recognition  process.  Finally,  theories  of  learning  will  help  the  system  to  adapt  to  new 
speakers,  environments,  and  tasks. 

The  research  projects  in  the  Spoken  Language  Systems  Group  fall  into  several  areas.  First, 
a  number  of  basic  research  topics  are  being  explored.  These  include  the  formulation  and 
testing  of  various  computational  models  for  human  auditory  processing,  speech  perception, 
and  natural  language  processing  that  are  suitable  for  spoken  language  understanding.  We 
are  also  attempting  to  quantify  the  acoustic  cues  for  phonetic  contrasts,  and  the  effects  of 
speaking  rate  and  style  on  the  acoustic  properties  of  speech.  Secondly,  these  research  results 
are  funneled  into  the  development  of  an  experimental  spoken  language  system.  Thirdly, 
alternative  approaches  to  speech  recognition,  including  the  use  of  artificial  neural  nets  and 
strategies  derived  from  vision  research,  are  being  explored.  Finally,  part  of  our  effort  is 
devoted  to  the  development  of  the  necessary  infrastructure,  including  the  development  of 
speech  research  tools  and  databases. 

11.2  Research  Reports 

11.2.1  Continuous  Speech  Recognition:  The  SUMMIT  System 

Recently,  we  put  together  a  speech  recognition  system  which  embodies  some  of  the  research 
that  we  have  been  conducting  in  automatic  speech  recognition.  The  system,  which  we 
call  SUMMIT,  is  intended  to  serve  as  a  testbed  for  a  segmental-based  approach  to  speech 
recognition.  In  addition,  it  enables  us  to  explore  how  speech  recognition  can  be  integrated 
with  natural  language  processing  in  order  to  achieve  speech  understanding. 

The  SUMMIT  system  starts  the  recognition  process  by  first  transforming  the  speech  signal  into 
a  representation  that  models  some  of  the  known  properties  of  the  human  auditory  system 
[265].  The  representation  is  illustrated  in  Figure  11.1(a),  for  the  sentence  “Where  is  the 
nearest  hospital?”  Using  the  output  of  the  auditory  model,  acoustic  landmarks  of  varying 
robustness  are  located  and  embedded  in  a  hierarchical  structure  called  a  dendrogram  [118], 
as  shown  in  Figure  11.1(b).  The  acoustic  segments  in  the  dendrogram  are  then  mapped 
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Figure  11.1:  Intermediate  representation  leading  to  the  recognition  of  the  sentence,  “Where 
is  the  nearest  hospital?”  The  display  contains:  (a)  synchrony  spectrogram,  (b)  a  dendrogram 
describing  the  multi-level  acoustic  segmentation,  (c)  a  phonetic  recognition  network,  (d)  a 
word  pronunciation  network,  and  (e)  the  ;>;cognition  result. _ 


to  phoneme  hypotheses,  using  a  set  of  automatically  determined  acoustic  parameters  in 
conjunction  with  conventional  pattern  recognition  algorithms  [246].  The  result  is  a  phoneme 
network,  in  which  each  arc  is  characterized  by  a  vector  of  probabilities  for  all  the  possible 
candidates,  as  shown  in  Figure  11.1(c). 

Words  in  the  lexicon  are  represented  as  pronunciation  networks,  which  are  generated  auto¬ 
matically  by  a  set  of  phonological  rules.  This  is  illustrated  in  Figure  11.1(d)  for  the  word 
“hospital.”  Probabilities  derived  from  training  data  are  assigned  to  each  arc,  using  a  cor¬ 
rective  training  procedure,  to  reflect  the  likelihood  of  a  particular  pronunciation.  Presently, 
lexical  decoding  is  accomplished  by  using  the  Viterbi  algorithm  to  find  the  best  path  that 
matches  the  acoustic-phonetic  network  with  the  lexical  network.  The  recognized  word  string 
is  shown  in  Figure  11.1(e). 

We  recently  evaluated  summit’s  performance  in  a  number  of  ways.  Phonetic  classification 
performance  was  evaluated  by  comparing  the  labels  provided  by  the  classifier  to  those  in  a 
time-aligned  transcription,  using  38  context-independent  phone  labels  [302].  This  particular 
set  was  selected  because  it  has  been  used  in  other  recent  evaluations  within  the  DARPA 
communi  y.  For  a  single  speaker,  the  top-choice  classification  accuracy  was  77%.  The  correct 
label  is  within  the  top  three  nearly  95%  of  the  time.  For  multiple  and  unknown  speakers, 
the  top-choice  accuracy  is  about  70%,  and  the  correct  choice  is  within  the  top  three  over 
90%  of  the  time.  Figure  11.2  shows  the  rank  order  statistics  for  both  the  speaker-dependent 
and  speaker-independent  cases. 
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Figure  11.2:  Rank  order  statistics  for  the  current  phone  classifier  on  a  speaker-independent 
task.  There  are  38  context-independent  phone  labels:  14  vowels,  3  semivowels,  3  nasals,  8 
fricatives,  2  affricates,  6  stops,  1  flap,  and  one  for  silence. _ 

Word  accuracy  for  the  SUMMIT  system  was  evaluated  on  the  DARPA  1000-word  Resource 
Management  task  [301].  Two  different  speaker-independent  test  sets  provided  by  NIST,  con¬ 
sisting  of  150  and  300  sentences,  respectively,  were  used  [241].  The  SUMMIT  system  achieved 
a  word  accuracy  of  87.1%  and  86.4%  on  the  two  test  sets,  respectively,  using  the  designated 
word-pair  grammar  with  perplexity  of  60,  and  approximately  70  context-independent  phone 
models.  Summit’s  performance  compares  favorably  with  systems  that  are  based  on  hidden 
Markov  modeling,  when  evaluated  on  the  same  data  and  using  a  similar  number  of  phone 
models  [186].  Since  other  researchers  have  been  able  to  improve  their  system’s  performance 
by  increasing  the  number  of  models  to  accommodate  context- dependency,  we  expect  that 
we  can  similarly  improve  summit’s  performance. 

11.2.2  Natural  Language  Processing:  The  Tina  System 

A  new  natural  language  system,  Tina,  was  developed  in  our  Group  [266]  which  integrates 
key  ideas  from  context  free  grammars,  Augmented  Transition  Networks  (ATN’s)  [296],  and 
Lexical  Functional  Grammars  (LFG’s)  [64].  TlNA  is  specifically  designed  to  accommodate 
full  integration  between  speech  recognition  and  natural  language  processing,  and  has  a  set 
of  features  reflecting  this  philosophy. 

The  grammar  begins  with  a  set  of  context-free  rewrite  rules,  which  are  augmented  with 
parameters  to  enforce  syntactic  and  semantic  constraints.  These  rules  are  converted  auto¬ 
matically  to  a  network  form,  leading  to  extensive  structure  sharing.  All  arcs  in  the  network 
have  associated  probabilities,  which  can  be  trained  automatically  from  a  set  of  parsed  sen¬ 
tences.  The  parser  uses  a  best-first  search  strategy.  Control  includes  both  top-down  and 
bottom-up  cycles,  and  key  parameters  are  passed  among  nodes  to  deal  with  long-distance 
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movement  and  agreement  constraints.  The  probabilities  provide  a  natural  mechanism  for 
exploring  more  common  grammatical  constructions  first.  TINA  also  includes  a  new  strategy 
for  dealing  with  movement,  which  can  handle  efficiently  nested  and  chained  gaps,  and  rejects 
crossed  gaps. 

Over  the  past  year,  TlNA  has  been  ported  to  the  DARPA  1000-word  Resource  Management 
task.  We  used  the  791  designated  training  sentences  and  200  (unseen)  test  sentences  to 
evaluate  our  parser  for  coverage  and  perplexity.  The  training  was  a  two-step  process.  We 
first  expanded  the  coverage  of  the  grammar  until  it  could  handle  all  of  the  791  training 
sentences  (100%  coverage).  We  then  built  a  new  subgrammar  from  these  sentences,  with 
probabilities  on  arcs  updated  according  to  their  usage  within  the  training  set  (any  rules 
that  only  appeared  in  the  TIMIT  domain  were  automatically  discarded).  This  resulted  in 
a  grammar  that  was  tightly  defined  for  the  RM  task.  We  then  tested  this  grammar  for 
coverage  and  perplexity  on  the  200  test  sentences.  The  results  were  that  84%  of  the  test 
sentences  were  parsable,  and  the  perplexity  was  368  if  all  words  that  could  follow  each  word 
were  considered  to  be  equally  likely.  The  surprising  result  was  that  the  perplexity  dropped 
9-fold  when  arc  probabilities  were  incorporated  into  the  measurement,  down  to  41.5.  We  also 
looked  at  the  parses  to  establish  the  depth  from  the  top  of  the  correct  parse.  We  found  that 
88%  of  the  training  sentences  gave  a  correct  parse  as  the  first  choice;  this  number  increased 
to  90%  for  the  test  sentences.  Both  sets  gave  the  correct  parse  within  the  top  three  over 
98%  of  the  time. 

11.2.3  Spoken  Language  Understanding:  The  Voyager  System 

Over  the  past  year,  we  initiated  an  effort  in  spoken  language  understanding.  The  project  is 
motivated  by  our  belief  that  many  of  the  applications  suitable  for  human/machine  interaction 
using  speech  typically  involve  interactive  problem  solving.  That  is,  in  addition  to  converting 
the  speech  signal  to  text,  the  computer  must  also  understand  the  linguistic  structure  of  a 
sentence  in  order  to  generate  the  correct  response. 

In  order  to  explore  issues  related  to  a  fully-interactive  spoken  language  system,  we  selected 
a  task  in  which  the  system  knows  about  the  physical  environment  of  a  specific  geographical 
area,  and  can  provide  assistance  on  how  to  get  from  one  location  to  another  within  this 
area.  The  system,  which  we  call  VOYAGER,  can  also  provide  information  concerning  certain 
objects  located  inside  this  area.  The  current  version  of  VOYAGER  focuses  on  the  geographic 
area  of  the  city  of  Cambridge  between  MIT  and  Harvard  University,  as  shown  in  Figure  11.3, 
and  can  answer  a  number  of  different  types  of  questions  about  certain  hotels,  restaurants, 
hospitals,  and  other  objects  within  this  region. 

VOYAGER  is  made  up  of  three  components.  The  first  component,  SUMMIT,  converts  the 
speech  signal  into  a  set  of  word  hypotheses.  The  natural  language  component,  TINA,  then 
provides  a  linguistic  interpretation  of  the  set  of  words.  The  parse  generated  by  the  natural 
language  component  is  then  transformed  into  a  set  of  query  functions,  which  is  passed  to 
the  backend  for  response  generation.  The  backend  is  an  enhanced  version  of  the  direction 
assistance  program  developed  by  Jim  Davis  of  the  Media  Laboratory  at  MIT.  The  response 
generator  maintains  some  knowledge  about  recent  discourse  history,  which  allows  it  to  re¬ 
spond  appropriately  to  queries  such  as  “How  do  I  get  there?”  Currently,  VOYAGER  can 
generate  responses  in  the  form  of  text,  graphics,  and  synthetic  speech. 
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Figure  11.3:  A  display  showing  the  geographical  region  known  to  the  VOYAGER  system. 

As  of  now,  VOYAGER  has  a  vocabulary  of  over  300  words,  and  it  can  deal  with  about  half  a 
dozen  types  of  queries,  such  as  the  location  of  objects,  simple  properties  of  objects,  how  to 
get  from  one  place  to  another,  and  the  distance  and  time  for  travel  between  objects.  Within 
this  limited  domain  of  knowledge,  it  is  our  hope  that  VOYAGER  will  be  able  to  handle  any 
reasonable  query  that  a  native  speaker  is  likely  to  initiate.  As  time  progresses,  VOYAGER’s 
knowledge  base  will  undoubtedly  grow. 

In  order  to  evaluate  VOYAGER’s  performance,  we  collected  a  corpus  of  some  5000  sponta¬ 
neously  spoken  sentences  from  100  speakers.  The  system  was  trained  on  approximately  70% 
of  the  data  and  tested  on  10%.  Errors  in  the  system  can  occur  in  several  ways;  the  recog¬ 
nizer  can  mis-recognize  a  word,  the  natural  language  system  can  fail  to  generate  a  parse,  an 
unknown  word  can  appear,  or  a  query  can  be  outside  of  VOYAGER’s  domain.  All  in  all,  the 
system  could  correctly  execute  approximately  50%  of  the  queries  during  a  recent  evaluation. 

The  current  implementation  of  VOYAGER  makes  use  of  a  Macintosh  II,  augmented  with  DSP 
boards,  for  data  capture  and  signal  processing.  Subsequent  phonetic  classification,  lexical 
access,  linguistic  analysis,  and  response  generation  are  all  performed  on  a  Sun-workstation. 
The  overall  response  time  is  approximately  15  times  real  time.  Refined  algorithms,  together 
with  the  availability  of  faster  workstations  and  more  powerful  signal  processing  chips  should 
enable  the  current  VOYAGER  implementation  to  run  in  real  time  in  the  future. 

11.2.4  Phonetic  Recognition  Using  Multi-layer  Perceptrons 

Over  the  past  two  years,  we  have  been  experimenting  with  the  use  of  artificial  neural  networks 
(ANN)  for  vowel  classification.  Our  work  was  motivated  by  the  belief  that  such  networks 
might  offer  a  flexible  framework  for  us  to  utilize  our  improved,  albeit  incomplete  speech 
knowledge.  Using  the  output  of  Seneff’s  auditory  model  as  the  input  to  the  multi-layer 
perceptrons  (MLP)  with  one  hidden  layer,  classification  accuracy  ranging  from  62%  to  100% 
were  achieved  under  varying  experimental  conditions.  These  results  compared  favorably  to 
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those  of  human  listeners  and  traditional  pattern  classification  techniques.  These  experiments 
helped  us  gain  a  better  understanding  of  the  behaviors  of  this  particular  kind  of  ANN  along 
many  dimensions,  including  the  effects  of  training  procedures,  the  nature  of  the  hidden  layer, 
and  the  use  of  adaptation  techniques  on  classification  accuracy.  Some  of  these  results  have 
been  documented  in  several  publications  [192],  Leung-88b,  Leung-89. 

More  recently,  we  have  expanded  our  earlier  work  by  moving  towards  the  classification  and 
recognition  of  all  phonemes  in  American  English  [193].  By  incorporating  novel  normalization 
and  training  procedures,  we  were  able  to  obtain  a  context-independent  classification  accuracy 
of  74%  for  38  phones,  using  as  input  a  set  of  80  automatically  determined  acoustic  attributes. 
The  same  system  configuration  achieved  a  phonetic  recognition  accuracy  of  55%,  including 
substitution,  insertion,  and  deletion  errors.  We  are  in  the  process  of  incorporating  the  ANN 
phonetic  recognition  module  into  the  SUMMIT  system  and  evaluating  its  impact  on  overall 
system  performance. 

11.2.5  Isolated  Word  Recognition  over  Telephone  Network 

Over  the  past  year,  we  initiated  an  effort  to  develop  a  small- vocabulary,  isolated-word  recog¬ 
nition  system.  The  focus  of  this  research  is  to  explore  how  our  phonetically- and  segmentally- 
based  approach  will  fare  with  the  bandlimited  and  distorted  speech  transmitted  through  local 
and  long  distance  telephone  networks,  spoken  by  real  users. 

As  a  first  step,  we  implemented  a  system  that  recognizes  25  city  names,  and  have  performed 
a  number  of  recognition  experiments  using  data  from  real  users  over  dial  up  telephone  lines 
collected  by  NYNEX.  Preliminary  evaluation  of  our  system  on  such  realistic  data  showed 
that  a  top-choice  accuracy  of  95%  can  be  achieved  with  a  20%  rejection  criterion.  We  are 
also  using  this  task  as  a  framework  in  which  to  explore  the  use  of  unsupervised  learning 
techniques  to  enable  the  automatic  expansion  and  modification  of  the  vocabulary. 

11.3  Student  Reports 

Nancy  Daly 

Nancy  is  pursuing  a  doctoral  thesis  on  prosodic  aids  for  speech  recognition.  Prosody  is  the 
stress,  rhythm,  and  intonation  of  speech.  While  the  importance  of  prosodic  information 
has  long  been  documented  for  human  speech  communication,  automatic  speech  recognition 
systems  developed  thus  far  have  all  but  ignored  this  source  of  information.  The  purpose 
of  her  thesis  research  is  to  see  how  prosodic  information  could  be  incorporated  into  speech 
recognition  systems  to  improve  their  performance. 

Currently,  Nancy  is  investigating  the  problem  of  distinguishing  yes/no  questions  from  “wh- 
”  questions  and  other  types  of  sentences.  Her  investigation  of  this  problem  spans  several 
directions.  First,  she  is  attempting  to  document  how  this  type  of  prosodic  encoding  is 
achieved  by  a  talker,  by  asking  listeners  to  categorize  questions  during  listening  tests.  Next, 
she  is  investigating  the  intonation  contour  in  order  to  establish  whether  a  terminal  high 
boundary  tone  is  actually  present  for  yes/no  questions.  Finally,  she  is  interested  in  devising 
algorithms  for  automatic  extraction  of  acoustic  attributes,  leading  to  the  detection  of  these 
high  boundary  tones. 
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David  Goddeau 

Dave  joined  the  group  as  a  graduate  student  in  September  1989.  During  the  summer,  he 
worked  for  the  group  porting  the  SUMMIT  auditory  model  to  C,  and  then  optimizing  to  run 
on  a  DSP  board.  During  the  fall  term,  he  contributed  to  porting  other  sections  of  the  system 
to  the  DSP  board  and  became  familiar  with  the  computing  environment  of  the  group.  He  is 
exploring  several  ideas  in  his  search  for  a  PhD  thesis  topic.  The  current  focus  of  his  work  is 
the  detection  of  unknown  words  in  a  human/machine  spoken  dialog. 

Lee  Hetherington 

Lee  joined  the  group  in  September  1989.  He  spent  the  fall  term  becoming  familiar  with  the 
group’s  computational  facilities  and  attended  spectrogram  reading  classes.  He  is  searching  for 
a  Doctoral  thesis  topic  in  the  area  of  adaptation  and  learning  for  improved  speech  recognition. 
Lee  is  currently  working  on  the  problem  of  adding  new  words  to  the  vocabulary  of  SUMMIT. 
Specifically,  he  will  be  working  on  unsupervised  learning  of  the  pronunciation  network  and/or 
the  phonetic  models  when  a  new  word  is  added  to  the  vocabulary.  The  goal  is  to  enable  the 
recognizer  to  improve  with  use. 

Rob  Kassel 

Rob  just  completed  his  Master’s  thesis  entitled  “An  Information- Theoretical  Approach  to 
Studying  Phoneme  Collocational  Constraints.”  Linguistic  constraints  are  well  known  and 
well  exploited  at  the  syntactic  and  semantic  levels.  Past  work  in  phonology  has  also  suggested 
the  existence  of  strong  constraints  of  phoneme  sequences,  which  are  best  expressed  in  terms 
of  phonological  equivalence  classes.  However,  these  constraints  are  typically  stated  introspec- 
tively.  This  research  studied  the  co-locational  constraints  of  phonemes  using  a  data-driven, 
self-organizing  approach.  Even  if  they  are  quantified,  the  equivalence  classes  are  usually 
pre-defined  by  linguists.  Information-theoretic  metrics  are  used  to  discover  phonological 
equivalence  classes  that  can  best  capture  the  constraints.  A  major  goal  of  his  research  was 
to  compare  the  constraining  power  of  these  equivalence  classes  with  those  of  the  distinctive 
features  as  suggested  from  phonological  theory. 

Jeffrey  Marcus 

Jeff  continued  work  on  his  thesis,  tentatively  entitled  “Incorporating  Units  of  Different  Sizes 
in  Segment-based  Speech  Recognition.”  One  goal  of  the  thesis  is  to  extend  current  tech¬ 
niques  for  combining  statistical  estimates  of  recognizer  model  parameters  made  on  phonetic 
units  of  various  sizes,  such  as  phones,  diphones  and  words.  Another  goal  is  to  incorporate 
measurements  made  over  acoustic  segments  of  various  sizes,  for  instance  over  phone-like  and 
diphone-like  units.  This  contrasts  with  most  current  speech  recognizers,  which  make  mea¬ 
surements  over  constant-duration  time  frames.  Towards  these  ends,  Jeff  worked  for  some 
time  on  refining  an  acoustic  segmentation  algorithm  with  the  aim  of  obtaining  segments 
which  map  to  phonetic  units  in  a  predictable  manner.  Over  the  next  year,  he  will  be  fo¬ 
cusing  on  the  modeling  of  function  words,  such  as  “the,”  “are,”  etc.  from  the  VOYAGER 
database,  using  the  techniques  mentioned  above. 
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In  addition  to  this  research,  he  also  worked  on  the  problem  of  statistically  comparing  the 
performance  of  two  speech  recognizers.  He  presented  this  work  at  the  European  Conference 
on  Speech  Communication  and  Technology. 

Helen  Meng 

Helen  is  interested  in  designing  and  implementing  a  novel  representation  of  speech  using 
distinctive  features  for  speech  recognition.  Distinctive  features  are  a  small  set  of  orthogonal 
properties  used  to  classify  both  phonemes  and  other  levels  of  phonology  and  phonetics. 
The  compact  inventory  of  distinctive  features  possesses  immense  descriptive  power,  and  can 
concisely  represent  speech  variations  such  as  coarticulatory  phenomena,  contextual  effects 
as  well  as  inter-speaker  differences.  It  is  believed  that  these  properties  of  distinctive  features 
are  extremely  beneficial  for  automatic  speech  recognition. 

Recent  research  has  focused  on  the  comparison  of  auditory-based  acoustic  representations. 
The  objective  of  this  work  is  to  find  a  favorable  front  end  for  future  distinctive  feature 
extraction.  The  mel-based  signal  representations  have  been  implemented,  and  compared 
with  Seneff’s  auditory  model  on  the  basis  of  vowel  classification,  using  the  artificial  neural 
network  developed  by  Hong  Leung.  Further  comparisons  will  be  made  on  the  basis  of  acoustic 
segmentation,  and  the  acoustic  correlates  of  the  distinctive  features  will  be  characterized  and 
quantified  for  the  purpose  of  feature  extraction.  This  line  of  investigation  will  lead  to  the 
transformation  of  an  acoustic  representation  into  a  novel  representation  in  terms  of  distinctive 
features,  and  the  resulting  recognition  performance  will  be  evaluated. 

Partha  Niyogi 

Since  joining  the  group  in  September  1989,  Partha  has  been  familiarizing  himself  with  the 
research  activities  of  the  group  (mainly  through  reading  papers)  and  its  computational  fa¬ 
cilities.  In  addition  to  his  coursework,  he  has  been  attending  spectrogram  reading  classes. 

Partha  has  also  been  searching  for  a  suitable  Master’s  thesis  topic.  In  particular,  he  is 
investigating  how  speaker-independent  phonetic  recognition  performance  can  be  improved 
by  considering  the  correlation  in  the  acoustic  space  of  different  phonemes  due  to  vocal  tract 
constraints.  He  will  explore  both  clustering  and  normalization  techniques  to  reduce  the 
variance  of  acoustic  parameters,  thus  leading  to  better  recognition  performance. 

John  F.  Pitrelli 

John  just  completed  his  Ph.D.  thesis  entitled  “Hierarchical  Modelling  of  Phoneme  Duration: 
Application  to  Speech  Recognition.”  Duration  is  potentially  a  strong  cue  for  certain  phone¬ 
mic  distinctions,  including  inherently  long  vs.  short  vowels,  and  voiced  vs.  unvoiced  obstruent 
consonants.  Phoneme  durations  are  affected,  though,  by  an  abundance  of  factors  ranging 
from  detailed  phonetic  context  effects  to  syntax  and  semantics.  Our  lack  of  understanding  of 
these  effects  and  their  interactions  hinders  our  use  of  potentially  useful  duration  information 
to  the  extent  that  most  speech  recognition  systems  currently  use  only  rudimentary  duration 
models  or  use  time-warping  procedures,  which  distort  duration  information. 

His  approach  to  the  duration  modeling  problem  was  to  use  a  hierarchical  model  to  account 
for  discrete- valued  factor  variables,  such  as  phonetic  context  features  and  syntactic-unit- final 
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lengthening.  Research  was  focused  on  three  areas.  One  was  to  search  for  ways  to  measure 
speaking  rate  which  would  be  suitable  for  use  in  a  duration  model  for  recognition,  and  to 
explore  the  effects  of  speaking  rate  on  phoneme  duration.  The  second  was  speech  synthe¬ 
sis  where  experiments  involved  replacing  the  duration  model  in  a  speech  synthesizer  with 
the  hierarchical  model,  and  performing  perceptual  tests  to  determine  whether  naturalness 
and/or  intelligibility  were  improved.  The  third  area  was  recognition  experiments  designed 
to  determine  how  much  the  duration  model  improved  recognizer  performance.  John  pursued 
two  lines  of  experimentation  in  this  area.  One  was  to  incorporate  his  duration  model  into 
an  existing  recognizer,  to  assess  how  much  duration  information  adds  to  the  performance 
achieved  using  other  information.  The  other  was  to  measure  the  potential  discrimination 
power  of  duration  information  by  evaluating  a  duration-only  classifier  on  particular  distinc¬ 
tions  associated  with  duration  effects. 

Michal  Soclof 

Michal  just  completed  her  Master’s  thesis  entitled  “A  Comparison  of  Spontaneous  Speech  and 
Read  Speech  in  Human-machine  Problem  Solving  Dialogues.”  The  purpose  of  her  research 
was  to  analyze  and  quantify  the  phenomena  that  occur  in  spontaneous  speech,  and  compare 
them  to  read  speech.  Spontaneous  speech  is  defined  in  this  context  as  speech  generated  by  a 
person  when  talking  to  a  computer  in  a  problem  solving  situation,  as  opposed  to  when  talking 
to  another  individual.  The  ultimate  goal  of  an  interactive  human-machine  interface  through 
speech  is  to  enable  the  user  to  communicate  with  the  machine  using  spontaneous  speech. 
In  order  to  build  this  type  of  system,  it  is  necessary  to  study  the  acoustic  and  linguistic 
variations  that  occur  in  goal- directed,  spontaneously  uttered  speech.  The  phenomena  that 
occur  in  spontaneous  speech  include  agrammatical  and  ill-formed  sentences,  false  starts,  and 
non-speech  vocalizations. 

In  order  to  conduct  this  study,  a  large  corpus  of  spontaneous  utterances  was  gathered.  The 
corpus  also  included  a  read  version  of  each  of  the  spontaneous  sentences.  The  analysis  of 
the  data  was  divided  into  three  categories:  frequency  analysis,  natural  language  analysis, 
and  acoustic/phonetic  analysis.  The  frequency  analysis  included  studying  the  frequency 
of  occurrence  in  read  and  spontaneous  speech  of  non-speech  vocalizations  such  as  mouth 
clicks,  filled  pauses,  unfilled  pauses,  and  false  starts.  The  natural  language  analysis  entailed 
finding  the  location  of  the  spontaneous  phenomena  within  the  sentence  structure.  The 
acoustic/phonetic  analysis  was  done  on  the  sentence,  word  and  phoneme  level.  This  included 
studying  the  duration  of  the  read  vs.  spontaneous  sentences,  and  the  duration  of  filled  and 
unfilled  pauses. 
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11.4  Publications 

[1]  S.  Seneff.  Probabilistic  Parsing  for  spoken  language  applications.  In  Proceedings  of  the 
International  Workshop  in  Parsing  Technologies ,  Pittsburgh,  PA,  1989. 

[2]  J.  Pitrelli.  A  hierarchical  model  for  phoneme  duration  in  american  english.  In  Proceed¬ 
ings  of  European  Conference  on  Speech  Communication  and  Technology ,  pages  324-327, 
Paris,  Fiance,  1989. 

[3]  J.  Marcus.  Significance  tests  for  comparing  speech  recognizer  performance  using  small 
test  sets.  In  Proceedings  of  European  Conference  on  Speech  Communication  and  Tech¬ 
nology ,  pages  465-468,  Paris,  France,  1989. 

[4]  V.  Zue,  S.  Seneff,  and  J.  Glass.  Speech  database  development:  TIMIT  and  beyond.  In 
Proceedings  of  the  Workshop  on  Speech  Input/Output  Assessment  and  Speech  Databases, 
Amsterdam,  The  Netherlands,  1989. 

[5]  V.  Zue,  J.  Glass,  D.  Goodine,  H.  Leung,  M.  Phillips,  J.  Polifroni,  and  S.  Seneff.  The 
VOYAGER  speech  understanding  system:  a  progress  report.  In  Proceedings  of  the  Second 
DARPA  Speech  and  Natural  Language  Workshop ,  pages  51-59,  Harwichport,  MA,  1989. 

[6]  V.  Zue,  J.  Glass,  D.  Goodine,  H.  Leung,  M.  Phillips,  J.  Polifroni,  S.  Seneff,  and  M. 
Soclof.  The  collection  and  preliminary  analysis  of  a  spontaneous  speech  database.  In 
Proceedings  of  the  Second  DARPA  Speech  and  Natural  Language  Workshop ,  pages  126- 
134,  Harwichport,  MA,  1989. 

[7]  V.  Zue,  J.  Glass,  D.  Goodine,  H.  Leung,  M.  Phillips,  J.  Polifroni,  and  S.  Seneff.  Pre¬ 
liminary  evaluation  of  the  VOYAGER  spoken  language  system.  In  Proceedings  of  the 
Second  DARPA  Speech  and  Natural  Language  Workshop,  pages  160-167,  Harwichport, 
MA,  1989. 

[8]  V.  Zue,  J.  Glass,  D.  Goodine,  M.  Phillips,  and  S.  Seneff.  The  SUMMIT  speech  recognition 
system:  phonological  modeling  and  lexical  access.  In  Proceedings  of  ICASSP,  pages  49- 
52,  Albuquerque,  NM,  1990. 

[9]  V.  Zue,  J.  Glass,  D.  Goodine,  H.  Leung,  M.  Phillips,  J.  Polifroni,  and  S.  Seneff.  The 
VOYAGER  speech  understanding  system:  preliminary  development  and  evaluation.  In 
Proceedings  of  ICASSP,  pages  73-76,  Albuquerque,  NM,  1990. 

[10]  H.  Leung  and  V.  Zue.  Phonetic  classification  using  multi-layer  perceptrons.  In  Proceed¬ 
ings  of  ICASSP,  pages  525-528,  Albuquerque,  NM,  1990. 

Thesis  Completed 

[1]  R.  Kassel.  An  Information-theoretical  Approach  to  Studying  Phoneme  Collocational 
Constraints.  Master’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer 
Science,  May  1990.  Supervised  by  V.W.  Zue. 
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[2]  A.  Lim.  A  New  Approach  to  Speech  Recognition  in  the  Presence  of  Co-channel  Speech 
Interference.  Master’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer 
Science,  May  1990.  Supervised  by  G.  Kopec  and  V.W.  Zue. 

[3]  K.  Ng.  A  Comparative  Study  of  the  Practical  Characteristics  of  Neural  Network  and 
Conventional  Pattern  Classifiers.  Master’s  thesis,  MIT  Department  of  Electrical  En¬ 
gineering  and  Computer  Science,  May  1990.  Supervised  by  R.P.  Lippmann  and  V.W. 
Zue. 

[4]  J.  Pitrelli.  Hierarchical  Modelling  of  Phoneme  Duration:  Application  to  Speech  Recog¬ 
nition.  PhD  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science, 
May  1990.  Supervised  by  V.W.  Zue. 

[5]  M.  Soclof.  A  Comparison  of  Spontaneous  Speech  and  Read  Speech  in  Human-Machine 
Problem  Solving  Dialogues.  Master’s  thesis,  MIT  Department  of  Electrical  Engineering 
and  Computer  Science,  May  1990.  Supervised  by  V.W.  Zue. 

[6]  D.  Whitney.  Building  a  Paradigm  to  Elicit  a  Dialog  with  a  Spoken  Language  System. 
Bachelor’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science, 
May  1990.  Supervised  by  V.W.  Zue. 

Theses  in  Progress 

[1]  N.  Daly.  Prosodic  Aids  to  Speech  Recognition.  PhD  thesis,  MIT  Department  of  Electri¬ 
cal  Engineering  and  Computer  Science,  expected  1991.  Supervised  by  V.W.  Zue. 

[2]  D.  Goddeau.  Detecting  and  Parsing  Unknown  Words  in  Speech  Understanding  Systems. 
PhD  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  expected 
1991.  Supervised  by  V.W.  Zue. 

[3]  I.L.  Hetherington.  Supervised  and  Unsupervised  Incremental  Training  in  Speech  Recog¬ 
nition.  PhD  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science, 
expected  1992.  Supervised  by  V.W.  Zue. 

[4]  J.  Marcus.  Building  Acoustic  Models  of  Words  and  Phrases  for  Speech  Recognition 
Using  Variable-sized  Lexical  and  Acoustic  Units  and  a  Data  Analytic  Approach.  PhD 
thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  expected  1991. 
Supervised  by  V.W.  Zue. 

[5]  H.  Meng.  A  Comparison  of  Auditory-based  Representations  of  Speech  and  the  Use  of 
Distinctive  Features  as  an  Intermediate  Representation  for  Automatic  Speech  Recogni¬ 
tion.  Master’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science, 
expected  1990.  Supervised  by  V.W.  Zue. 

[6 j  P.  Niyogi.  A  Study  of  the  Correlation  in  Acoustic  Space  of  Different  Phonemes  Due 
to  Vocal  Tract  Constraints.  Master’s  thesis,  MIT  Department  of  Electrical  Engineering 
and  Computer  Science,  expected  1991.  Supervised  by  V.W.  Zue. 
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Talks 

[1]  V.W.  Zue.  Yesterday,  today,  and  tomorrow,  on  the  state  of  voice  I/O  technology. 
Keynote  Address  made  at  the  at  the  Annual  Meeting  of  the  American  Voice  Input/ Output 
Society,  Newport  Beach,  CA,  September  1989. 

[2]  V.W.  Zue.  MIT’s  Voyager  spoken  language  system.  Distinguished  Lecture  given  at 
Oregon  Graduate  Center,  Beaverton,  OR,  November  1989. 

[3]  V.W.  Zue.  Computers  that  listen:  recent  progress  at  MIT.  Lecture  given  at  MIT  EECS 
Colloquium,  November  1989. 

[4]  V.W.  Zue.  Dennis  Klatt’s  contribution  to  automatic  speech  recognition.  Invited  lec¬ 
ture  given  at  the  118£/l  Meeting  of  the  Acoustical  Society  of  America,  St.  Louis,  MO, 
November  1989. 

[5]  S.  Seneff.  A  natural  language  system  for  spoken  language  applications.  Invited  lecture 
given  at  the  ATR  Symposium  on  Basic  Research  for  Telephone  Interpretation,  Kyoto, 
Japan,  December  1989. 

[6]  V.W.  Zue.  Future  directions  for  spoken  language  research.  Invited  lecture  given  at 
the  ATR  Symposium  on  Basic  Research  for  Telephone  Interpretation,  Kyoto,  Japan, 
December  1989. 

[7]  V.W.  Zue.  Computers  that  listen:  recent  progress  at  MIT.  MIT  ILO  Seminar,  Tokyo, 
Japan,  February  1990. 

[8]  V.W.  Zue.  Assessment  of  speech  I/O  technology.  Keynote  Address  given  at  the  at  the 
Citicorp-TTI  Voice  Technology  Seminar,  Santa  Monica,  CA,  March  1990. 

[9]  V.W.  Zue.  Computers  that  listen:  recent  progress  at  MIT.  Lecture  given  at  the  MIT 
Industrial  Liaison  Program  and  MIT  International  Financial  Services  Research  Center 
Seminar,  Cambridge  MA,  May  1990. 
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12.1  Objectives 

Our  main  goal  is  to  develop  techniques  and  tools  that  will  facilitate  the  efficient  production  of 
high  quality  software.  Most  of  our  attention  is  concentrating  on  exploring  the  role  of  formal 
specifications  in  building  programs  from  components,  encouraging  module-level  reuse,  and 
allowing  early  detection  of  design  errors. 

12.2  Approach 

Effective  programming  involves  choosing  abstractions  that  can  be  combined  to  solve  a  prob¬ 
lem,  specifying  their  meanings,  and  implementing  them.  It  is  better  to  think  about  combining 
abstractions  than  about  combining  implementations  because:  1)  specifications  are  easier  to 
understand  than  implementations,  2)  software  is  easier  to  maintain  if  it  relies  only  on  prop¬ 
erties  guaranteed  by  specifications,  and  3)  components  are  more  likely  to  be  reusable  if  a 
distinction  is  made  between  abstractions  and  implementations. 

The  Larch  family  of  specification  languages  is  unique  in  the  way  it  supports  a  two-tiered 
definitional  approach  to  specification.  Each  specification  has  components  written  in  two 
languages:  one  designed  for  a  specific  programming  language  and  another  independent  of 
any  programming  language.  The  former  are  called  Larch  interface  languages,  and  the  latter 
the  Larch  Shared  Language  (LSL).  Larch  allows  one  to  specify  reusable  theories  (at  the 
LSL-level),  as  well  as  interfaces  to  actual  software  components. 

The  Larch  style  of  specification  emphasizes  brevity  and  clarity  rather  than  executability. 
To  make  it  possible  to  test  specifications  without  executing  or  implementing  them,  Larch 
permits  specifiers  to  make  claims  about  logical  properties  of  specifications  and  to  check 
these  claims  at  specification  time.  The  emphasis  in  our  tools  is  thus  on  reasoning  at  the 
component  level,  rather  than  at  the  code  level.  The  most  interesting  tool  is  LP,  a  system 
used  for  reasoning  about  semantic  properties  of  specifications. 

Our  work  on  parallel  programming  is  related  to  our  work  on  specifications  in  that  it  em¬ 
phasizes  module-level  programming.  Various  methods  have  been  proposed  to  address  the 
problem  of  writing  and  reasoning  about  parallel  programs,  but  these  tend  to  address  only 
single  module  programs.  Notions  of  module  composition  are  missing,  and  as  a  result,  a  style 
of  program  development  is  encouraged  in  which  the  entire  program  is  designed  and  imple¬ 
mented  as  a  single  unit.  Conversely,  methods  that  have  been  developed  for  “programming 
in  the  large”  of  sequential  programs  are  not  applicable,  and  attempts  to  extend  them  to 
parallel  programs  often  result  in  programs  that  exhibit  very  little  real  concurrency.  Our  goal 
is  to  be  able  to  decompose  parallel  programs  into  independently  specifiable  units  without 
prohibiting  efficient  implementations. 

12.3  Recent  Accomplishments 

We  released  version  2.0  of  the  LP  system  for  reasoning  about  specifications.  LP  is  being 
used  for  reasoning  about  specifications  of  software  components  at  MIT,  CMU  and  Digital 
Equipment  Corporation’s  System  Research  Center  (DEC  SRC)  and  for  reasoning  about 
circuit  descriptions  at  DEC  SRC  and  the  Technical  University  of  Denmark. 
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We  completed  a  description  of  how  to  translate  and  formulate  the  necessary  semantic  checks 
for  all  of  LSL  using  LP.  This  will  make  it  possible  to  begin  serious  experiments  with  the  use 
of  LP  in  debugging  specifications. 

We  developed  a  novel  approach  to  designing  and  implementing  efficient,  yet  modular,  parcel 
programs.  This  approach  was  used  to  implement  one  small  and  one  reasonably  large  parallel 
program. 

12.4  Plans  for  Current  Year 

We  expect  to  complete  the  design  and  preliminary  implementation  of  a  Larch/C  interface 
language.  By  the  end  of  the  fiscal  year,  we  hope  to  have  used  it  to  specify  an  interesting  set 
of  reusable  components  that  can  be  used  by  C  programmers. 

We  plan  to  write  up  our  approach  to  building  parallel  programs.  This  will  include  a  careful 
evaluation  of  the  performance  of  the  programs  we  have  built  using  it,  and  a  discussion  of 
the  strengths  and  weaknesses  of  the  approach. 

We  plan  to  enhance  LP  by  adding  efficient  special  purpose  reasoning  procedures.  These  will 
be  aimed  at  specific  theories  that  have  proved  critical  to  current  users  of  LP. 

We  plan  to  complete  a  new  LSL  implementation  and  connect  it  to  both  LP  and  the  Larch/C 
processor. 

12.5  Collaboration  Outside  LCS 

We  have  collaborated  for  many  years  with  researchers  at  DEC  SRC  on  the  design  of  the  Larch 
family  of  languages.  This  collaboration  will  continue.  SRC  has  been  a  primary  Beta  site  for 
testing  our  software.  They  contributed  many  valuable  suggestions  that  were  incorporated  in 
the  current  release  of  LP.  Finally,  we  have  worked  with  SRC  on  the  specification  of  threads. 

We  have  been  working  with  researchers  at  CMU  on  the  design  of  Larch/C.  Researchers  at 
CMU  have  used  Larch  to  specify  a  variety  of  software  components.  They  have  also  used  LP 
to  reason  about  those  components. 

Researchers  at  the  Technical  University  of  Denmark  have  used  LP  to  reason  about  descrip¬ 
tions  of  circuits.  They  currently  have  software  that  allows  them  to  generate  input  to  a 
fabrication  facility  from  descriptions  written  in  a  high  level  circuit  description  language. 
They  are  currently  writing  a  compiler  to  compile  descriptions  written  in  their  language  into 
input  suitable  for  LP  (up  to  now  the  translation  has  been  done  by  hand).  Last  summer  they 
fabricated  a  chip  that  was  verified,  using  LP. 
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[3]  M.B.  Reinhold.  Type  Checking  is  Undecidable  When  “Type”  is  a  Type,  Technical  Report 
MIT/LCS/TR-458,  MIT  Laboratory  for  Computer  Science,  December  1989. 

[4]  J.S.  Staunstrup,  S.J.  Garland,  and  J.V.  Guttag.  Localized  verification  of  circuit  descrip¬ 
tions.  In  Proceedings  of  an  International  Workshop  on  Automatic  Verification  Methods 
for  Finite  State  Systems ,  Grenoble,  France.  Also  Lecture  Notes  in  Computer  Science, 
407:349-364  Springer- Verlag,  1989. 
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13.1  Introduction 

The  MIT  Theory  of  Computation  (TOC)  group  is  one  of  the  largest  theoretical  computer 
science  research  groups  in  the  world.  It  includes  faculty,  students,  and  visitors  from  both  the 
Electrical  Engineering  and  Computer  Science  Department,  and  the  Applied  Mathematics 
Department. 

The  principal  research  areas  investigated  by  members  of  the  TOC  Group  are: 

•  algorithms:  combinatorial,  geometric,  graph-theoretic,  number  theoretic; 

•  cryptology; 

•  computational  complexity; 

•  parallel  computation; 

•  distributed  computation:  algorithms  and  semantics; 

•  machine  learning; 

•  semantics  and  logic  of  program  c;  and 

•  VLSI  design  theory. 

13.2  Faculty  Reports 

Baruch  Awerbuch 

Awerbuch  worked  on  a  number  of  topics  related  to  design  and  analysis  of  communication 
protocols. 

Also,  Awerbuch  worked  on  many  specific  problems  in  dynamic  networks.  Together  with 
Shavit  and  Mansour  [28],  Awerbuch  discovered  the  first  polynomial  solution  to  the  end-to- 
end  communication  problem.  This  is  one  of  the  basic  network  problems;  it  was  conjectured  in 
[3]  that  it  has  no  polynomial  solution.  Together  with  Goldreich  and  Herzberg  (Technion),  [27] 
he  developed  a  quantitative  framework  for  analyzing  performance  of  broadcast  protocols  in 
dynamic  networks.  Together  with  Kutten  (IBM)  and  Cidon  (IBM),  he  discovered  an  efficient 
algorithm  for  maintaining  topology  [25]  in  a  dynamic  network.  In  [24],  those  techniques 
have  been  modified  to  yield  efficient  control  mechanisms  for  new  generation,  fast  hardware- 
switched  networks  developed  in  IBM. 

Together  with  Goldberg  (Stanford),  Luby  (ICSI),  and  Piotkin  (Stanford)  he  found  a  new 
technique  [26]  for  removing  randomness  from  distributed  computing  that  has  yielded  fast 
deterministic  algorithms  for  Maximal  Independent  Set,  A  +  1  Coloring  and  Breadth  First 
Search.  Together  with  Peleg  (Weizmann)  and  Baratz  (IBM),  Awerbuch  developed  a  new 
framework  for  cost-sensitive  analysis  of  distributed  algorithms  [23]. 

The  emphasis  in  Awerbuch’s  work  is  distributed  data  structures  and  their  application.  In 
[29],  Awerbuch  formalized  the  problem  and  gave  the  first  solution  to  the  problem  of  online 
dynamic  directories,  in  which  both  updates  and  searches  are  efficient.  Also,  he  showed  that 
network  routing  can  be  performed  with  compact  routing  tables,  without  causing  a  significant 
increase  in  communication  [30]. 
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Peter  Elias 

The  paper  on  error-correcting  codes  under  list  decoding,  which  appeared  as  a  Technical 
Report  at  the  time  of  the  last  progress  report,  has  since  been  accepted  for  publication  [104]. 

The  work  on  iterative  e'rirr- correcting  coding  schemes  referred  to  in  the  last  progress  report, 
exploring  of  the  effect  on  performance  on  reusing  the  low  order  check  bits  after  higher  order 
bits  have  been  used,  has  led  to  analytical  and  simulation  results  which  show  that  the  reuse  of 
check  bits  brings  performance  much  closer  to  channel  capacity.  That  work  has  been  deferred 
for  most  of  the  past  year  while  the  author  was  pursuing  sociological  research.  A  preliminary 
report  of  the  sociological  work  has  come  out  in  two  parts  [279]  [278],  and  has  been  accepted 
for  publication.  Work  on  the  iterative  codes  will  resume  during  the  summer  of  1990. 

Shafi  Goldwasser 
Research  Topics 

Fairness  in  Distributed  Computation 

Together  with  Leonid  Levin,  we  extended  some  of  our  previous  results  on  fairness  in  dis¬ 
tributed  computation  started  in  1988-89  as  outlined  below. 

In  1988-89,  Goldwasser  together  with  Beaver  [39]  investigated  the  problem  of  performing 
a  distributed  computation  in  a  network  with  broadcast  channels  and  any  number  of  faulty 
processors.  It  was  shown  how  the  existence  of  an  oblivious  transfer  protocol  is  a  necessary 
and  sufficient  condition  to  compute  any  polynomial  time  boolean  function  defined  on  the 
processors  private  inputs  privately,  correctly,  and  with  a  novel  property:  faime The  faulty 
processors  can  find  out  the  function  value  “if  and  only  if”  the  non-faulty  processors  find  out 
the  function  value,  in  a  a  certain  technical  probabilistic  sense. 

It  was  left  open,  whether  the  same  is  true  for  non-boolean  functions.  Together  with  Levin 
[129],  we  resolve  this  problem.  We  show  how  to  compute  any  function  from  strings  to  strings 
privately,  correctly,  and  fairly  for  a  definition  of  fairness  extended  to  the  non-boolean  case. 

In  the  same  work,  a  set  of  new  definitions  for  “what  should  be  desired  from  a  fault-tolerant 
computation”  in  presence  of  malicious  faults  is  proposed.  The  new  definitions  are  more 
general  and  yet  simpler  than  the  ones  used  previously  in  this  field.  We  show  that  our  new 
requirements  are  achieved  by  previous  solutions  of  [128]  and  others,  and  imply  all  properties 
required  by  previous  definitions. 

Randomness  in  Interactive  Proofs 

Interactive  proofs  [130]  are  an  extension  of  the  classical  NP  notion  of  efficient  provability 
in  which  nondeterminism  is  enhanced  by  two  new  ingredients:  randomness  and  interaction. 
While  the  verifier  in  an  NP  proof  is  deterministic,  the  verifier  in  an  interactive  proof  is 
allowed  to  flip  coins.  In  addition,  the  prover  and  the  verifier  can  interact  for  a  polynomial 
number  of  rounds  of  message  exchange.  Recent  results  by  Fortnow,  Karloff,  Lund,  Nisan 
[114],  and  Shamir  [267]  showing  that  IP=PSPACE  suggest  that  adding  the  new  ingredients 
indeed  enlarges  the  class  of  languages  which  can  be  efficiently  recognized. 
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Furthermore,  both  of  the  new  ingredients  are  necessary:  if  the  verifier  were  not  probabilistic, 
then  the  prover  could  simply  simulate  all  the  moves  of  the  verifier  on  his  own  and  IP  would 
collapse  to  NP;  in  the  absence  of  interaction,  IP  would  collapse  to  BPP. 

An  important  question  then  is  how  much  of  these  resources  are  really  necessary.  How  much 
interaction  is  necessary  has  received  much  attention  (see  for  example,  [31]  and  others).  How 
much  randomness  is  necessary  for  IP  has,  on  the  other  hand,  received  no  attention. 

Together  with  Bellare,  Rompel,  and  Goldreich  [42],  we  started  an  investigation  into  how 
much  randomness  is  necessary  to  recognize  a  language  L  G  IP  in  a  given  number  of  rounds 
and  a  given  error  probability.  In  particular,  we  show  how  to  transform  an  interactive  proof 
system  for  L  where  tosses  l  coins  per  round  and  error  probability  |  to  an  interactive  proof 
system  for  L  with  the  same  number  of  rounds,  error  probability  Jp,  the  verifier  tosses  0(1+ k) 
coins  per  round.  Previous  transformation  required  0(lk)  coins  per  round. 


Professional  Activities 

During  1989-90,  Goldwasser  was  an  invited  speaker  to  the  short  course  on  “Number  Theory 
and  Cryptography”  held  by  the  AMS  in  Boulder,  CO  during  August.  The  lecture  notes  will 
be  assembled  into  a  book  by  the  American  Mathematical  Society.  Goldwasser  was  also  an 
invited  speaker  to  the  British  Colloquium  on  computer  science  held  in  Manchester,  Britain  in 
March  1990,  and  an  invited  speaker  to  the  International  Congress  of  Mathematicians  (ICM) 
in  Japan,  August  1990. 

Goldwasser  also  participated  in  the  Oberwolfach  Conference  on  Cryptography,  Germany,  Oc¬ 
tober  1989;  the  DIAMIC5  Workshop  on  Distributed  Computing  and  Cryptography,  October 
1989;  and  Workshop  on  Circuit  Complexity  in  Barbados,  organized  by  McGill  University, 
February  1990. 

Goldwasser  edited  “Advances  in  Cryptology:  CRYPTO’88,”  published  by  Springer- Verlag, 
which  appeared  in  February  1990.  The  book  contains  the  proceedings  of  the  CRYPT088 
Conference  held  in  Santa  Barbara  in  August  1980  for  which  she  was  a  Chairperson.  In 
addition,  Goldwasser  continued  to  be  an  editor  for  SIAM  Journal  of  Computing,  Journal  of 
Cryptology,  and  a  new  journal  on  the  foundations  of  computer  science. 

Finally,  Goldwasser  is  writing  up  two  sets  of  of  lecture  notes.  The  first  on  computational 
number  theory  and  cryptography  (a  result  of  the  short  AMS  course),  and  the  second  a 
manuscript  for  an  introductory  cryptography  classes. 

Leonidas  Guibas 

During  the  past  year,  Guibas  together  with  several  coworkers,  studied  certain  natural  com¬ 
binatorial  questions  in  geometry.  The  results  obtained  are  of  the  following  form:  if  we  are 
given  points  in  some  Euclidean  space  and  many  “nice”  objects,  each  “defined”  by  only  a  few 
of  the  points,  then  a  large  number  of  these  objects  must  intersect  at  a  common  point  (not 
necessarily  one  of  the  given  points).  For  instance,  if  we  are  given  n  sites  on  the  real  line,  and 
m  >  2n  intervals  defined  by  pairs  of  these  sites,  then  there  has  to  be  a  point  of  the  real  line 
contained  in  at  least  m3 /An2  of  the  intervals. 
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These  covering  results  lead  to  many  interesting  consequences  in  combinatorial  and  compu¬ 
tational  geometry.  They  can  be  used  to  show  that,  given  any  set  P  of  n  points  in  space, 
there  exists  another  set  Q  of  roughly  n1/2  points  such  that  the  Delaunay  triangulation  of 
P  1 J  Q  has  size  roughly  only  0(n3//2),  even  though  that  of  P  itself  can  have  size  fl(n2)  [70]. 
Furthermore,  the  set  Q  can  be  efficiently  computed  from  P.  Thus  in  three  dimensions,  if  a 
point  set  has  an  unacceptably  large  Delaunay  triangulation,  we  can  always  add  some  extra 
points  and  reduce  the  size  of  the  triangulation.  Such  results  are  important  in  building  finite 
element  codes. 

Another  application  of  the  packing  results  yields  the  following:  for  any  set  S  of  n  points  in 
the  plane,  and  any  n3 triangles  spanned  by  these  points,  there  exists  another  point  (not 
necessarily  of  S)  contained  in  at  least  (roughly)  n3-3a  of  the  triangles.  This  implies  that  any 
set  of  71  points  in  three  dimensional  space  has  at  most  (roughly)  n8^3  halving  planes  [14]. 
This  is  a  big  improvement  on  the  previously  best  known  upper  bound  for  one  of  the  most 
famous  problems  in  combinatorial  geometry. 

On  a  different  front,  Guibas,  together  with  some  ex-students  and  coworkers  from  Japan, 
continued  the  development  of  geometric  algorithms  based  on  the  topological  sweep  paradigm 
introduced  by  him  and  Edelsbrunner  [101]  a  few  years  back.  New  results  this  year  include 
an  extension  of  the  topological  sweep  algorithm  to  handle  an  arrangement  of  planes  in  three 
dimensions  [11],  as  well  as  a  delicate  refinement  of  the  planar  method  that  is  capable  of 
sweeping  the  portion  of  an  arrangement  inside  a  convex  region  in  time  proportional  to  the 
complexity  present  in  that  region  [15].  Both  of  these  extensions  have  many  applications  that 
are  given  in  the  referenced  papers. 

Guibas  and  coworkers  at  DEC/SRC  continued  previous  work  on  developing  a  framework  for 
building  robust  geometric  algorithms.  They  developed  a  proof  that,  given  any  set  of  points 
in  the  plane,  there  exists  a  polygon  that  is  convex  enough  to  be  found  with  approximate 
tests,  such  as  floating-point  arithmetic,  and  is  also  very  close  to  tightly  containing  all  the 
points  of  the  given  set.  Such  a  polygon  is  a  very  useful  approximation  to  the  convex  hull 
of  the  points  in  an  environment  (such  as  all  real  geometry  systems)  where  one  needs  to 
operate  with  primitives  of  limited  precision.  They  also  developed  an  algorithm  for  finding 
this  approximate  convex  hull  [137].  The  computation  of  convex  hulls  (this  time  in  an  infinite 
precision  model)  also  gave  rise  to  a  new  data  structure  called  compact  interval  trees.  This 
is  a  structure  that  allows  one  to  compute  very  quickly  convex  hulls  of  subpaths  of  a  given 
simple  polygonal  path  in  the  plane.  This  and  related  results  are  given  in  [135]. 

Finally,  Guibas,  Knuth,  and  Sharir  developed  an  extremely  simple  new  incremental  algorithm 
for  computing  Voronoi  diagrams  or  Delaunay  triangulations  of  point  sets  in  the  plane.  The 
algorithm  takes  optimal  time  O(nlogn)  if  we  randomize  the  sequence  of  insertion  of  the 
sites.  It  also  has  many  other  pleasant  properties,  including  the  fact  it  allows  one  to  build 
a  point-location  structure  for  searching  the  corresponding  diagram  without  additional  effort 

[136]. 

Service  to  the  profession:  During  the  past  year  Guibas  served  on  the  Program  Committee 
for  the  SIGGRAPH  ’90  ACM  Conference,  and  continued  to  serve  on  the  editorial  board  of 
nine  journals. 
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Tom  Leighton 

Leighton  and  his  colleagues  continued  to  make  substantial  progress  on  packet  routing  and 
sorting  algorithms,  fault-tolerance  in  networks,  and  on  approximation  algorithms  for  NP- 
complete  problems. 

Highlights  of  this  year’s  research  include  the  discovery  of  the  first  C>(log  jV)-step  algorithm 
for  online  routing  in  a  nonblocking  network  (joint  with  Sanjeev  Arora  and  Bruce  Maggs), 
and  the  discovery  of  a  simple  sorting  circuit  with  depth  7.5  log  N  that  works  for  almost  all 
inputs  (joint  with  Greg  Plaxton).  Both  results  appear  to  be  practical  and  could  well  have 
applications  to  the  design  of  parallel  machines  and  communications  networks.  In  particular, 
Leighton  and  Maggs  are  working  with  members  of  Tom  Knight’s  group  on  the  design  of  a 
parallel  network  for  the  Transit  Machine. 

The  new  ACM  Symposium  on  Parallel  Algorithms  and  Architectures  got  off  to  a  great  start 
with  its  first  meeting  in  Santa  Fe  last  year,  combining  the  best  of  theory  and  practice  in 
the  parallel  area.  This  year’s  meeting  will  be  in  Crete,  and  an  excellent  collection  of  papers 
has  already  been  selected  for  presentation  at  the  meeting.  Leighton  will  continue  to  serve  as 
Conference  Chair  for  the  symposium  until  the  1992  meeting.  The  1991  meeting  will  be  in 
late  July  on  Hilton  Head  Island,  South  Carolina. 

Leighton  continues  to  write  his  book  on  parallel  algorithms  and  architectures  (ever  so  slowly), 
and  he  is  optimistic  that  with  the  addition  of  Bruce  Maggs  as  coauthor,  the  project  can  be 
completed  this  year! 

Both  Mark  Hansen  and  John  Rompel  will  complete  their  theses  this  year,  and  have  done 
a  great  job.  Mark  is  going  into  business,  and  is  off  to  a  great  start  after  winning  a  $5,000 
prize  in  the  MIT  entrepreneurship  contest.  John  Rompel,  on  the  other  hand,  became  the 
first  student  ever  to  win  two  best  student  paper  awards  at  the  FOCS  and  STOC  conferences. 
John’s  awards  were  for  his  discovery  of  methods  for  derandomizing  parallel  algorithms  (joint 
with  Bonnie  Berger),  and  for  his  design  of  secure  digital  signature  schemes. 

Charles  E.  Leiserson 

Leisersan’s  research  is  currently  focusing  on  the  problems  of  building  very  large  scale  com¬ 
puters  having,  perhaps,  more  than  a  billion  processors.  In  addition,  he  has  been  studying 
parallel  algorithms  and  architectures,  as  well  as  phenomena  in  VLSI  circuitry.  A  highlight 
of  his  research  this  year  is  an  algorithm  developed  with  his  student  Alexander  Ishii  that 
provides  the  first  polynomial-time  test  for  correct  functioning  of  level- clocked  circuitry  [149]. 
Also,  much  of  Leiserson’s  earlier  work  on  digital  circuit  retiming  with  James  B.  Saxe  of 
Digital  Equipment  Corporation  was  revised  and  published  [189]. 

Leiserson  also  completed  writing  his  textbook  [80]  Introduction  to  Algorithms ,  coauthored 
by  Ronald  Rivest  and  Thomas  Cormen.  The  book  offers  a  comprehensive,  but  elementary, 
introduction  to  the  analysis  of  computer  algorithms.  It  is  published  jointly  by  the  MIT  Press 
and  McGraw-Hill  Book  Company. 

Three  of  Leiserson’s  Ph.D.  students  completed  their  degrees  in  the  past  year.  Ronald  1. 
Greenberg  completed  his  thesis  Efficient  Interconnection  Schemes  for  VLSI  and  Parallel 
Computation ,  and  assumed  an  assistant  professorship  at  the  University  of  Maryland.  Bruce 
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M.  Maggs  completed  Locality  in  Parallel  Computation,  and  became  a  postdoc  in  the  group. 
Cynthia  A.  Phillips’s  thesis  is  entitled  Theoretical  and  Experimental  Analyses  of  Parallel 
Combinatorial  Algorithms ,  and  she  assumed  a  research  position  at  Sandia  National  Labora¬ 
tory  in  New  Mexico. 

Leiserson  is  currently  supervising  Bobby  Blumofe,  Thomas  Cormen,  Alexander  Ishii,  Shlomo 
Kipnis,  and  James  Park. 

In  September,  Leiserson  assumed  leadership  of  the  newly  formed  VLSI  and  Parallel  Systems 
Group  at  MIT  when  Paul  Penfield,  who  had  run  the  Microsystems  Research  Center,  became 
head  of  the  EECS  Department.  The  programs  run  by  the  MRC  were  divided  between  VPS 
and  the  Microsystems  Technology  Laboratories.  The  group  currently  has  eight  faculty  drawn 
from  three  MIT  Laboratories:  Artificial  Intelligence  Laboratory,  Laboratory  for  Computer 
Science,  and  Research  Laboratory  for  Electronics.  The  goal  of  the  VPS  Group  is  to  under¬ 
stand  how  integrated  circuit  technology  can  be  applied  to  the  design  and  construction  of 
high  performance  computer  systems. 

The  VPS  Group  ran  four  important  programs  last  year.  The  Sixth  MIT  VLSI  Conference 
in  April  brought  together  technical  leaders  in  the  university,  industry,  and  government  com¬ 
munities.  The  attendees  agreed  it  was  one  of  the  strongest  conferences  we  have  had.  Jointly 
with  the  Microsystems  Technology  Laboratories,  the  VPS  Group  also  supported  the  MIT 
VLSI  Seminar  Series,  the  MIT  VLSI  Research  Review,  and  the  MIT  VLSI  memo  series. 

Nancy  Lynch 

Please  see  her  entry  under  the  Theory  of  Distributed  Systems  chapter. 

Albert  R.  Meyer 

Meyer’s  research  focuses  on  semantics  and  logic  of  programming  languages.  During  the  past 
year,  he  worked  on  the  following  particular  research  topics. 

Research  Topics 

•  Semantics  of  Concurrency :  Meyer  and  Bloom  continued  their  study  of  basic  notions 
of  concurrent  process  equivalence  [56] [55]  [57]  [58] . 

•  Semantics  of  Terminating  Evaluation.  Research  with  Riecke  and  Cosmadakis  (IBM 
Watson  Research  Center)  on  the  “lazy”  lambda  calculus  which  repairs  the  mismatch 
between  classical  semantics  in  which  expressions  M  and  \x.M  mean  the  same  thing, 
even  though  evaluation  of  M  may  diverge  while  evaluation  of  A x.M  always  terminates 
immediately,  cf.  [223] [59] [8 1  ] .  In  [82],  they  demonstrate  completeness  and  decidability 
of  an  axiom  system  for  equational  sequents  involving  pure,  simply  typed  terms  of  the 
“lazy”  lambda  calculus. 

•  Dataflow  Semantics :  See  the  report  of  Arie  Rudich. 

•  Theory  of  Sequential  Functions:  See  the  report  of  Trevor  Jim. 

•  Type-chcckiny  for  records  with  inheritance:  See  the  report  of  Lalita  Jategoankar. 
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Professional  Activities 

•  Co-chair,  “International  Conference  on  Theoretical  Aspects  of  Computer  Software,” 
Sendai,  Japan,  September  1991. 

•  General  Chair,  IEEE  Symposium  on  Logic  in  Computer  Science  (LICS). 

•  Moderator  for  three  computer  science  research  email  forums  on  (1)  types,  (2)  concur¬ 
rency,  and  (3j  logic. 

•  Member,  Program  Committee,  International  Symposium  on  Logic  at  Botik,  Pereslavl- 
Zalessky,  USSR,  July  1989;  “Kleene  ’90”  Logic  Symposium,  Chaika,  Bulgaria,  June 
1990. 

•  Member,  NSF  Computer  Science  Advisory  Board. 

•  Thesis  Supervision: 

Ph.D.  1.  Bard  Bloom,  completed  August  1989. 

2.  Jon  Riecke. 

3.  David  Wald. 

4.  Lalita  Jategoankar. 

S.M.  1.  Michael  Ernst. 

2.  Arthur  F.  Lent. 

3.  Lalita  Jategoankar,  completed  August  1989  [151]. 

4.  Trevor  Jim. 

5.  Arie  Rudich,  completed  May  1990  [256]. 

S.B.  1.  Arthur  F.  Lent,  completed  January  1990  [190]. 

•  Editorial  Activity: 

Editor-in-Chief,  Information  and  Computation ;  Managing  Editor,  Annals  of  Pure  and 
Applied  Logic ;  Co-editor,  MIT  Press  Foundations  of  Computing  Series ;  Editorial  Board 
Member,  SIAM  Journal  of  Computing,  Journal  of  Computer  and  System  Sciences ,  The¬ 
oretical  Computer  Science,  and  Advances  in  Applied  Mathematics;  Advisory  Editor, 
Handbook  of  Logic  in  Computer  Science  and  Handbook  of  Theoretical  Computer  Sci¬ 
ence;  Co-editor,  Proceedings  of  Logic  at  Botik  [224];  and  Member,  MIT  Press  Editorial 
Board. 

Silvio  Micali 

Most  of  Micali’s  research  efforts  have  been  devoted  to  develop  theory  in  the  area  of  secure 
protocols.  Here,  there  is  good  news  and  bad  new.  The  good  news  is  that  we  are  dealing 
with  a  novel  and  exciting  subject  that  may  prove  very  useful  in  an  electronic  world.  The 
bad  news  is  that  the  field  is  very  difficult  and  largely  unexplored. 
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A  program  is  already  a  very  difficult  beast  to  tame  and  deciding  whether  a  given  program 
is  correct  not  only  is  impossible  in  general,  but  is  also  “impossible”  in  practice.  Worse  yet, 
a  protocol  is  a  program  run  by  many  parties,  some  of  which  may  cooperate  in  disrupting 
the  joint  effort!  In  this  situation,  not  only  is  it  difficult  to  find  correct  protocols,  but  also 
to  discuss  what  correctness  should  mean.  We  were  fortunate,  though,  to  make  progress  on 
finding  the  right  notion  of  security  for  multiparty  protocols  and  a  theoretical  study  of  how 
efficient  can  these  protocols  be  made.  We  describe  seme  token  results  in  this  line. 

The  Right  Notions.  The  field  of  secure  protocols  has  quickly  outpaced  its  foundations. 
Valuable  definitions  for  secure  protocols  have  been  given  (in  various  degrees  of  explicitness) 
by  many  researchers — including  us.  However,  these  definitions  differed  among  themselves 
in  subtle  but  crucial  ways,  and  were  sometimes  designated  protocols  “secure”  that  should 
not  be  so.  analysis.  In  [166],  we  finally  provide  an  important  step  in  order  for  foundational 
issues  to  catch  up.  Namely,  we  distill  the  proper  notion  of  secure  computation,  and  endow 
the  field  with  a  sound  and  most  general  definition  of  it. 

Minimizing  Resources.  Having  clarified  what  secure  protocols  should  be,  we  turned  our 
attention  to  investigating  what  are  the  minimum  amount  of  resources  for  indeed  implement¬ 
ing  protocols  in  a  secure  way. 

In  [40],  we  consider  the  following  problem.  Assume  we  have  n  parties,  1  through  n;  each 
party  i  has  a  private  input  x;  known  only  to  him.  The  parties  want  to  evaluate  correctly  a 
given  function  /  on  their  inputs — that  is  to  compute  y  =  /(xj, ...,  xn) — while  maintaining 
the  privacy  of  their  own  inputs.  That  is,  they  do  not  want  to  reveal  more  than  the  value  y 
implicitly  reveals.  This  problem  is  the  archetypal  secure  protocol  problem.  It  was  known  to 
be  solvable  before,  but  the  number  of  rounds  of  communication  for  correctly  and  privately 
evaluating  /  grew  with  the  complexity  of  the  function  f.  This  was  indeed  a  drawback,  since 
one  can  perform  a  lot  of  cryptographic  computation  in  a  few  seconds,  but  the  time  to  send 
e-mail  back  and  forth  a  thousand  times  easily  requires  a  month.  Fortunately  we  have  been 
able  to  show  that  there  exists  a  protocol  for  correctly  and  privately  evaluating  any  function, 
independently  of  its  complexity,  in  a  constant  number  of  rounds  (actually  16.) 

Sometimes,  there  are  circumstances  in  which  not  even  a  constant  number  of  communication 
rounds  is  deemed  practical.  What  then?  In  [43],  we  show  that  fundamental  protocols,  like 
the  oblivious  transfer,  can  be  implemented  securely  without  any  interaction  at  all!  In  essence, 
we  exhibit  a  method  to  compute  special,  matching  encoding-decoding  key  pairs.  User  A, 
having  produced  such  a  pair  (P,S),  will  publish  P  and  keep  secret  S.  At  this  point,  a  sender  B 
can  look  up  P,  make  sure  that  it  satisfies  a  given  condition  and  encode  two  message  mj  and 
m2,  using  the  public  encoding  key.  B  is  guaranteed  that  user  A,  thanks  to  his  secret  decoding 
key  S,  will  be  able  to  decode  exactly  one  of  these  two  encrypted  message:  either  m0  or  m1( 
at  random.  Moreover,  the  sender  B  will  not  know  which  of  the  two  messages  A  succeeded 
in  reading.  An  exchange  like  the  one  discussed  above  is  called  an  oblivious  transfer.  It  is  a 
fundamental  primitive  in  protocol  design  and  it  was  previously  believed  to  require  a  lot  of 
interaction  for  achieving  its  “paradoxical”  constraints.  Instead,  we  show  that  A  and  B  do 
not  need  to  interact,  except  for  A  publishing — once  and  for  all — the  public,  encoding  key. 
This  result  improves  a  previous  one  of  Bellare  and  Micali  where,  however,  the  transfers  were 
not  independent.  That  is,  if  B  sent  to  A  two  more  messages,  and  A  succeeded  the  first  time 
around  in  reading  the  first  message,  then  he  will  read  the  first  message  also  the  second  time. 
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In  [165],  we  minimize  the  resources  needed  in  a  zero- knowledge  proof.  These  are  the  number 
of  ciphertexts  (as  each  ciphertext  involves  an  overhead  in  number  of  bits)  end  the  number  of 
rounds  of  interaction.  We  show  that  two  ciphertexts  are  enough.  Also,  we  show  that  after  a 
small  pre-processing  step,  given  any  one-way  function,  one  can  prove  an  unbounded  number 
of  theorems  in  zero  knowledge,  each  without  interaction.  That  is,  each  zero- knowledge  proof 
consists  of  a  single  message,  from  the  prover  to  the  verifier,  to  which  the  verifier  needs  not  to 
respond.  Thus,  the  proving  process  is  extremely  efficient,  once  the  preprocessing  step  is  done. 
Moreover,  such  a  preprocessing  step  depends  logarithmically  on  the  desired  probability  of 
error;  ch  is  also  very  efficient.  Previous  methods  also  exhibited  a  polynomial  dependence  on 
the  size  of  the  theorem  (i.e.,  the  number  of  bits  in  its  statement — which  can  be  very  large). 

Additional  results  on  minimizing  the  resources  in  zero-knowledge  protocols  can  be  found  in 
references. 

Making  it  easy.  In  [46],  we  show  how  the  design  of  cryptographic  protocols  can  be  vastly 
simplified.  Essentially,  we  exhibit  a  cryptographic  compiler  that,  given  a  protocol  that  is 
secure  with  respect  to  an  honest  participant,  returns  a  protocol  that  is  secure  even  against 
an  arbitrarily  cheating  one. 

Make  it  efficient.  In  [106],  we  address  the  problem  of  efficiency  for  the  fundamental 
primitive  of  digital  signatures.  Computing  a  digital  signature  is  feasible  for  the  legal  signer, 
though  not  trivial.  However,  it  should  be  infeasible  for  a  forger.  An  RSA  signature  with 
a  512-bit  security  parameter  appears  hard  to  forge  with  current  knowledge  about  factoring 
algorithms,  and  can  be  easily  computed  in  a  few  seconds.  However,  should  the  difficulty  of, 
say,  integer  factorization  substantially  decrease,  then  signing  will  require  minutes  even  to  the 
legal  signer.  This  is  unpleasant,  since  the  signing  of  a  message  can  start  after  the  message 
is  chosen.  Thus,  we  develop  an  alternative  approach  to  digital  signature  that  allows  us  to 
transfer  the  bulk  of  the  computation  in  an  offline  stage  that  can  be  performed  when  the 
message  of  interest  is  not  yet  chosen.  Online,  that  is  after  the  message  has  been  selected,  a 
trivial  amount  of  work  will  be  required  instead. 

One  of  the  proposed  directions  of  research  is  finding  how  much  security  is  obtainable  in  proto¬ 
cols  without  relying  on  complexity  theory.  Previous  work  by  Goldwasser,  Ben-Or,  Wigderson 
and  Chaurn,  Crepeau  and  Damgard  showed  that  having  physically  secure  communication 
channels  and  an  honest  majority  may  be  enough.  Recently  Kilian  and  Micali  [163]  showed 
that  one  can  do  without  an  honest  majority,  by  using  some  simple  physical  mechanism  like 
an  urn  and  ballots. 

Ronald  L.  Rivest 

Rivest's  research  focuses  primarily  on  the  theoretical  aspects  of  machine  learning. 

Together  with  Javed  Aslam,  Rivest  investigated  the  problem  of  inferring  the  structure  of  a 
Markov  chain  from  its  output.  The  special  case  of  inferring  the  structure  Markov  chains, 
where  each  node  has  degree  at  most  two,  has  been  effectively  handled  by  a  new  algorithm. 

Together  with  Bonnie  Eisenberg,  Rivest  investigated  the  power  of  “membership  queries”  for 
learning  (in  the  distribution-free  sense  of  Valiant)  various  concept  classes.  A  general  theorem 
has  been  proven  showing  that  for  many  concept  classes  membership  queries  do  not  improve 
the  efficiency  of  learning. 
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Rivest  supervised  an  S.B.  thesis  of  Robert  T.  Adams  on  the  problem  of  inferring  Lisp  pro¬ 
grams  from  examples,  in  which  queries  to  the  user  are  used  to  resolve  ambiguities  that  arise 
during  the  inference  process. 

Together  with  Tom  Cormen  and  Charles  Leiserson,  Rivest  has  finished  an  introductory 
text  on  algorithms  [80].  This  text  should  be  suitable  for  both  introductory  undergraduate 
and  introductory  graduate  courses.  Rivest  developed  a  new  one-way  hash  function,  called 
“MD4,”  which  he  proposed  for  adoption  as  a  standard  “message  digest”  algorithm. 

Michael  Sipser 

Sipser’s  work  continues  to  focus  on  aspects  of  complexity  theory,  including  the  structure  of 
complexity  classes  and  proving  lower  bounds  on  the  complexity  of  specific  problems. 

Together  with  Michelangelo  Grigni,  Sipser  [134]  proposed  a  system  of  monotone  complexity 
classes,  paralleling  the  familiar  system.  This  includes  monotone  analogs  of  the  classes  P,  NP, 
L,  NL,  NC,  and  ACO  among  others.  This  allows  a  succinct  statement  of  a  number  of  the 
previous  known  results  in  the  area  of  monotone  complexity,  as  well  suggesting  several  new 
lines  of  research.  Monotone  computation  is  of  interest  because  it  provides  a  nontrivial  setting 
where  the  analogs  of  a  number  of  the  long-standing  unsolved  problems  in  computational 
complexity  theory  may  be  settled,  possibly  shedding  some  light  on  the  case  for  general 
computation.  One  of  the  new  results  that  has  been  obtained  has  been  to  show  that  the 
monotone  counterpart  to  the  class  NL,  is  not  closed  under  complementation,  in  contrast  with 
the  recent  results  of  Immerman  and  Szelepcseny.  One  may  view  this  as  an  argument  that 
their  simulation  must  be  inherently  more  complicated  than  many  of  the  older  simulations, 
which  also  go  through  for  the  monotone  case. 

Together  with  Ravi  Boppana,  Sipser  has  completed  a  survey  of  recent  work  on  the  complexity 
of  finite  boolean  functions  [63].  This  will  appear  in  the  forthcoming  Handbook  of  Theoretical 
Computer  Science. 

Last  August,  Sipser  participated  in  a  conference  at  the  University  of  Chicago  organized 
around  a  series  of  ten  lectures  that  he  gave  on  circuit  complexity  and  the  P  versus  NP 
question.  The  meeting  was  quite  successful,  drawing  over  one  hundred  participants. 

13.3  Student,  Research  Associate,  and  Visitor  Reports 

Javed  A.  Aslam 

Aslam  has  been  working  with  Ron  Rivest  on  the  inference  of  random  walk  processes.  A 
random  walk  process  is  modeled  as  a  graph  where  nodes  represent  states  and  edges  represent 
transitions  from  state  to  state.  If  the  edges  are  colored,  then  the  output  of  a  random  walk 
process  is  the  sequence  of  colors  corresponding  to  the  transitions  taken  in  a  random  walk  on 
the  underlying  graph.  The  aim  of  this  research  is  to  be  able  to  reconstruct  the  underlying 
graph  given  an  output  sequence  of  colors.  Aslam  and  Rivest  [17]  developed  polynomial  time 
algorithms  to  construct  minimum  consistent  underlying  graphs  of  degree  2.  Algorithms  to 
construct  underlying  graphs  of  degree  >  2  are  being  pursued. 
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Aslaxn  has  also  been  working  with  Aditi  Dhagat  on  the  problem  of  2-  coloring  fc-hypergraphs. 
Aslam  and  Dhagat  [16]  developed  an  optimal,  online  algorithm  for  2-coloring  a  A-hypergraph 
when  the  hypergraph  has  fewer  than  2fc_1  edges.  Further,  they  have  shown  that  if  the 
hypergraph  has  more  than  (3  +  2y/2)k  edges,  then  no  online  coloring  algorithm  exists. 

Mihir  Bellare 

Bellare,  Micali,  and  Ostrovsky  investigated  zero- knowledge  interactive  proofs  in  several  di¬ 
rections. 

Quadratic  residuosity  and  graph  isomorphism  are  classic  problems  and  the  canonical  ex¬ 
amples  of  zero-knowledge  languages.  However,  despite  much  research  effort,  all  previous 
zero- knowledge  proofs  for  them  required  either  unproven  complexity  assumptions  or  an  un¬ 
bounded  number  of  rounds  of  message  exchange.  For  both  languages  (and  more  generally 
for  any  random  self- reducible  language),  [45]  we  exhibit  zero-knowledge  proofs  that  require 
only  five  rounds  and  no  unproven  assumptions. 

Statistical  zero-knowledge  is  a  very  strong  privacy  constraint  which  is  not  dependent  on 
computational  limitations.  In  [46],  it  is  shown  that  given  a  complexity  assumption  a  much 
weaker  condition  suffices  to  attain  statistical  zero-knowledge.  As  a  result  it  is  possible  to 
simplify  statistical  zero- knowledge  and  to  better  characterize,  on  many  counts,  the  class  of 
languages  that  possess  statistical  zero-knowledge  proofs. 

Security  in  cryptography  is  traditionally  proven  via  reductions.  This  can  lead  however  to 
considerable  loss  of  provable  security  in  practice.  [44]  analyses  various  coin  flipping  in  the 
well  protocols  from  this  point  of  view  and  finally  provides  a  new  protocol  which  performs 
better  than  previous  ones. 

Bellare,  Co  wen,  and  Goldwasser  investigated  in  [41]  the  properties  of  the  Secret  Key  Exchange 
Protocol.  This  work  is  described  in  Cowen’s  report. 

The  power  of  interactive  proofs  arises  from  the  use  of  two  resources:  interaction  and  random¬ 
ness.  While  the  first  has  been  much  investigated,  the  second  is  considered  for  the  first  time 
in  [42].  Here,  it  is  shown  how  to  substantially  reduce  the  amount  of  randomness  necessary 
for  the  verifier  to  be  convinced  of  membership  in  a  language,  within  a  given  error  probability 
and  a  given  number  of  rounds  of  interaction. 

Bonnie  Berger 

This  year,  Berger  continued  her  research  on  removing  randomness  from  algorithms,  espe¬ 
cially  focusing  on  parallel  algorithms.  All  related  work  is  contained  in  her  completed  doctoral 
dissertation  [37].  There  are  three  steps  to  removing  randomness  from  parallel  algorithms: 
first,  design  a  randomized  parallel  algorithm;  second,  modify  this  algorithm  to  work  using  a 
weaker  form  of  randomness  (one  which  has  a  smaller  probability  space);  third,  deterministi¬ 
cally  search  this  probability  space  for  a  point  on  which  the  algorithm  performs  well. 

Berger,  together  with  Rompel,  [49] [36]  developed  a  general  framework  for  removing  random¬ 
ness  from  randomized  NC  algorithms  whose  analysis  uses  only  polylogarithmic  indepen¬ 
dence.  Previously  no  techniques  were  known  to  determinize  those  RNC  algorithms  depend¬ 
ing  on  more  than  constant  independence.  One  application  of  their  techniques  is  the  first 
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NC  algorithm  for  the  set  discrepancy  problem,  which  can  be  used  to  obtain  many  other  NC 
algorithms,  including  a  better  NC  edge  coloring  algorithm  and  the  first  NC  lattice  approx¬ 
imation  algorithm.  As  another  application  of  their  techniques,  they  provided  the  first  NC 
algorithm  for  a  hypergraph  coloring  problem.  Additionally,  they  showed  the  set  discrepancy 
problem  actually  requires  (log  n)-wise  independence.  This  work  received  the  Machtey  Award 
for  Best  Student  Paper  at  FOCS’89. 

Berger,  working  with  Rompel  and  Peter  Shor  [50],  gave  the  first  NC  approximation  algo¬ 
rithms  for  the  unweighted  and  weighted  set  cover  problems.  Their  algorithms  use  a  linear 
number  of  processors  and  give  a  cover  that  has  at  most  logn  times  the  optimal  size/weight, 
thus  matching  the  performance  of  the  best  sequential  algorithms.  Previously,  there  were  no 
known  parallel  algorithms  for  the  general  set  cover  problem.  Berger,  Rompel,  and  Shor  de¬ 
vised  a  randomized  algorithm,  depending  on  only  pairwise  independence,  and  then  converted 
it  to  a  deterministic  one.  Furthermore,  they  applied  their  set  cover  algorithm  to  learning 
theory,  giving  an  NC  algorithm  to  learn  the  concept  class  obtained  by  taking  the  closure 
under  finite  union  or  finite  intersection  of  any  concept  class  of  finite  VC-dimension  which 
has  an  NC  hypothesis  finder.  In  addition,  they  gave  a  linear-processor  NC  algorithm  for  a 
variant  of  the  set  cover  problem,  and  used  it  to  obtain  NC  algorithms  for  several  problems 
in  computational  geometry. 

Berger,  working  with  Peter  Shor  [51],  devised  the  first  nontrivial  randomized  polynomial¬ 
time  and  RNC  algorithms  for  the  maximum  acyclic  subgraph  problem  (or  equivalently,  the 
minimum  feedback  arc  set  problem),  and  used  known,  highly  sequential  techniques  to  convert 
the  randomized  polynomial-time  algorithm  to  a  deterministic  one. 

Berger  [47]  introduced  a  new  combinatorial  method  to  bound  the  expectation  of  an  absolute 
value  from  below  by  a  fourth  moment.  She  presented  a  special  case  of  this  lower  bound  in 
a  particularly  useful  form — yielding  a  general  mathematical  inequality  on  expectations.  She 
applied  her  fourth  moment  method  to  obtain  the  first  nontrivial,  and  also  fairly  efficient,  NC 
algorithm  for  the  maximum  acyclic  subgraph  problem.  The  method  she  used  to  derive  this 
algorithm  allowed  her  to  obtain  a  new  bound  on  tournament  ranking.  She  also  applied  her 
fourth  moment  method  to  devise  the  first  NC  algorithm  (which  is  also  fairly  efficient)  for  the 
problem  of  obtaining  large  discrepancy.  This  can  be  used  to  obtain  the  first  NC  algorithms 
for  the  Gale-Berlekamp  switching  game  and  the  edge  discrepancy  problem.  In  fact,  she 
showed  that  it  is  truly  necessary  to  consider  a  fourth  moment — that  is,  the  requirement  of 
4- wise  independence  is  tight. 

Diverging  from  the  topic  of  removing  randomness,  Berger  with  Cowen  [48],  investigated  a 
new,  natural  enlarged  class  of  scheduling  problems,  allowing  concurrency  and  weak  prece¬ 
dence  constraints,  as  well  as  the  classical  partial  order  constraints  among  tasks.  They  proved 
that  the  problem  of  scheduling  with  both  prerequisites  and  co-requisites,  and  the  problem 
of  scheduling  with  just  <  constraints,  are  both  NP-Complete  for  k  >  3  parallel  processors. 
They  gave  an  0(n3)  algorithm  for  optimally  scheduling  with  any  subset  of  {<,  <,  =}  con¬ 
straints  for  k  —  2,  and  obtained  approximation  algorithms  for  k  >  3  processors.  Their 
results  have  applications  to  the  Horizon  architecture,  an  architecture  currently  receiving  a 
good  deal  of  attention  in  Supercomputing  research. 

Berger  is  currently  beginning  an  NSF  Mathematical  Sciences  Postdoctoral  Research  Fellow- 
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ship,  at  MIT  under  the  sponsorship  of  Daniel  Kleitman. 

Margrit  Betke 

Betke  began  her  studies  as  a  graduate  student  at  MIT  in  September  1989.  She  spent  most 
of  the  year  on  coursework,  and  worked  on  problems  in  computational  geometry.  Betke  is 
interested  in  algorithms  for  machine  learning.  She  plans  to  work  with  Ron  Rivest  during  the 
summer  of  1989. 

Avrim  Blum 

Blum  has  been  working  on  problems  in  computational  learning  theory  and  continuing  his 
work  in  approximation  algorithms  for  graph  coloring. 

In  the  area  of  machine  learning  theory,  he  developed  a  model  for  learning  from  examples  in 
situations  where  there  is  a  very  large  number  of  potential  features  or  attributes  that  examples 
might  have,  even  though  each  example  itself  may  have  only  a  few  of  them  [60],  For  instance, 
this  mieht  occur  if  one  wished  to  classify  research  papers  based  on  keywords;  each  paper 
has  only  a  few  keywords,  but  the  space  of  potential  keywords  is  quite  large  and  perhaps  not 
even  known  beforehand.  He  shows  that  many  of  the  basic  kinds  of  learnable  concepts  in 
the  standard  theoretical  models  can  be  learned  through  different  methods  in  this  new  model 
as  well.  Blum  has  also  worked  on  examining  the  relationship  between  two  existing  popular 
theoretical  learning  models  [61].  He  shows  a  method  using  techniques  from  cryptography 
to  create  concepts  easy  to  learn  in  one  model  but  hard  to  learn  in  the  other.  This  method 
allows  one  to  focus  on  important  differences  between  the  models. 

Together  with  Singh,  Blum  has  been  studying  the  problem  of  learning  concepts  that  can 
be  described  as  a  function  of  a  small  number  of  monomials  over  a  set  of  Boolean  variables. 
These  concepts  generalize  the  class  of  fc-term  DNF  studied  by  Pitt  and  Valiant  [247].  Blum 
and  Singh  show  that  the  generalized  class  can  be  learned,  but  that  to  avoid  intrinsic  com¬ 
putational  limitations,  the  learner  must  use  an  expanded  hypothesis  representation.  That 
is,  if  the  learner  is  forced  to  represent  his  hypotheses  in  the  same  form  as  the  concept  being 
learned,  then  the  problem  becomes  hard  (NP-complete),  but  if  the  learner  is  allowed  a  freer 
selection  of  hypotheses,  then  learning  can  be  done  quickly. 

In  the  area  of  approximation  algorithms  for  graph  coloring,  Blum  improved  on  previous 
performance  guarantees  for  the  problem  of  approximate  coloring  of  3-chromatic  graphs.  He 
presents  an  algorithm  that  will  color  any  3-chromatic  graph  with  0(n3^+o^)  colors  in  the 
worst  case  [62].  In  addition,  he  is  examining  the  problem  of  coloring  graphs  generated  by 
random  and  “semi-random”  sources. 

Tom  Cormen 

Cormen,  along  with  Charles  Leiserson  and  Ron  Rivest,  completed  writing  the  textbook 
Introduction  to  Algorithms  [80],  which  is  being  copublished  by  The  MIT  Press  and  McGraw- 
Hill.  The  book  should  appear  in  May  or  June  1990. 

Cormen  is  now  pursuing  doctoral  research.  He  is  currently  working  with  Leiserson  and 
Shlomo  Kipnis  on  the  notion  of  locality-enhancing  graph  embeddings:  given  a  source  graph 
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G  and  a  target  graph  H,  embed  G  in  H  so  that  if  two  vertices  u  and  v  in  G  are  distance  I 
apart  (i.e.,  the  shortest  path  between  u  and  v  contains  l  edges),  then  the  distance  between 
u  and  v  in  H  is  /(/),  where  /  is  as  small  a  function  as  possible.  Cormen  and  Kipnis  have 
proven  upper  bounds  on  embedding  linear  graphs  in  d-dimensional  meshes,  X-trees,  and 
hypercubes,  and  they  are  continuing  this  line  of  research. 

Lenore  Cowen 

In  [41],  Bellare,  Cowen,  and  Goldwasser  investigated  the  properties  which  must  be  inherent 
in  any  Secret  Key  Exchange  Protocol ,  which  is  a  protocol  enabling  two  parties  to  establish  a 
common  and  secret  key  over  public  channels.  They  showed  strong  structural  requirements 
for  such  a  protocol  in  both  the  uniform  and  non-uniform  model.  These  in  turn  allowed  them 
to  prove  that  one-way  functions  are  necessary  for  secret  key  exchange  (as  a  corollary,  one 
way  functions  are  necessary  for  oblivious  transfer).  Their  results  imply  that  the  existence 
of  a  secret  key  exchange  protocol  also  implies  existence  of  strong  bit  commitment  schemes, 
(i.e.,  schemes  in  which  the  committer  can  reveal  only  one  possible  value,  regardless  of  the 
computational  power  of  the  committer.) 

Cowen,  with  Berger  [48],  investigated  a  new,  natural  enlarged  class  of  scheduling  problems, 
allowing  concurrency  and  weak  precedence  constraints,  as  well  as  the  classical  partial  order 
constraints  among  tasks.  They  proved  that  the  problem  of  scheduling  with  both  prereq¬ 
uisites  and  co-requisites,  and  the  problem  of  scheduling  with  just  <  constraints,  are  both 
NP-complete  for  A:  >  3  parallel  processors.  They  gave  an  0(n3)  algorithm  for  optimally 
scheduling  with  any  subset  of  {<,  <,  =}  constraints  for  k  =  2,  and  obtained  approximation 
algorithms  for  k  >  3  processors.  Their  results  have  applications  to  the  Horizon  architec¬ 
ture,  an  architecture  currently  receiving  a  good  deal  of  attention  in  supercomputing  research 
[100][1 79]  [280] . 

Cowen  plans  to  continue  work  on  issues  raised  in  her  work  with  Berger;  she  also  plans  to 
investigate  further  cryptographic  problems  with  Goldwasser. 

Aditi  Dhagat 

Dhagat  was  a  teaching  assistant  for  the  graduate  Theory  of  Computation  course  (6.840/18.404) 
during  the  fall  semester.  She  also  worked  with  Sipser  on  the  problem  of  constructing  pseudo¬ 
random  number  generators  which  are  secure  without  requiring  unproven  assumptions.  This 
is  in  the  spirit  of  results  of  Nisan  and  Wigderson  [232],  where  pseudorandom  generators 
secure  against  constant  depth  polynomial  were  shown  to  exist.  The  security  relied  on  proven 
lower  bounds  on  circuit  size  for  computing  the  parity  function.  A  later  result  of  Nisan  [231] 
constructs  pseudorandom  generators  secure  against  one-way  logspace  machines,  where  the 
security  comes  from  using  universal  hash  functions.  This  generator  stretches  a  random  seed 
of  size  log2  n  to  a  pseudorandom  string  of  length  n.  Dhagat  and  Sipser  are  also  working  on 
the  problem  of  improving  this  stretch,  while  maintaining  the  strength  of  the  security  of  the 
generator. 

During  the  spring  semester,  Dhagat  worked  with  Aslam  on  a  problem  suggested  by  Joel 
Spencer.  She  and  Aslam  [16]  have  shown  online  algorithms  for  2-coloring  &- hypergraphs. 
Their  results  show  that  if  the  A:-hypergraph  has  no  more  than  2k~1  edges,  then  it  can  be 
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2-colored  by  an  optimal  online  algorithm.  If  the  Ai-hypergraph  has  bounded  degree  (say  k ) 
and  has  more  than  (3  +  2\/2)k  edges,  then  they  show  that  there  is  no  online  algorithm  to 
2-color,  regardless  of  computation  time. 

Michael  Ernst 

Ernst  spent  most  of  his  time  on  coursework  this  year,  but  also  continued  work  with  Meyer. 
He  will  intensify  his  research  on  programming  language  semantics  during  the  summer,  and 
expects  to  complete  his  S.M.  thesis  within  the  year. 

Wayne  Goddard 

Goddard  is  a  second-year  student  focusing  on  combinatorics.  He  has  also  been  working  on 
“partial”  sorting  problems.  In  these  problems,  one  is  asked  to  determine  only  some  of  the 
relationships  between  the  elements;  for  example  one  might  be  given  a  collection  of  subsets  of 
the  elements,  and  asked  to  find  the  maximum  in  each  set.  Together  with  King  and  Schulman 
[119],  Goddard  found  randomized  algorithms  for  two  such  problems  which  use  significantly 
less  comparisons  than  naive  algorithms.  It  is  also  shown  in  [119]  that  these  algorithms  are 
optimal  to  within  a  constant  factor. 

Sally  A.  Goldman 

Goldman  continued  to  work  in  the  area  of  computational  learning  theory.  With  Rivest  and 
Schapire,  Goldman  studied  the  problem  of  designing  polynomial  prediction  algorithms  for 
learning  binary  relations  [126].  They  study  these  problems  under  an  online  model  in  which 
the  instances  are  drawn  by  the  learner,  by  a  helpful  teacher,  by  an  adversary,  or  according 
to  a  probability  distribution  on  the  instance  space.  Their  goal  is  to  minimize  the  number  of 
incorrect  predictions.  They  represent  the  relation  as  an  n  x  m  binary  matrix,  and  present 
results  for  when  the  matrix  is  restricted  to  have  at  most  k  distinct  row  types,  and  when  it  is 
constrained  by  requiring  that  the  predicate  form  a  total  order.  Namely,  for  both  cases  they 
present  upper  and  lower  bounds  on  the  number  of  mistakes  any  learning  algorithm  makes 
when  learning  such  a  matrix  under  their  extensions  of  the  online  learning  model. 

With  Kearns,  Goldman  is  exploring  some  of  the  interesting  questions  raised  by  the  extended 
mistake  bound  model  discussed  above.  Namely,  they  are  studying  how  the  complexity  of  the 
learner’s  task  (as  judged  by  the  number  of  mistakes)  depends  on  the  presentation  order  of 
the  instances.  When  a  teacher  selects  the  presentation  order,  they  consider  the  maximum 
number  of  mistakes  made  by  any  learner  that  predicts  according  to  some  concept  that  agrees 
with  all  previously  seen  instances.  Thus  they  ask:  what  is  the  minimum  number  of  examples 
a  teacher  must  reveal  to  uniquely  identify  the  target  concept?  They  are  also  studying  the 
related  question  of  how  many  mistakes  the  learner  must  make  in  the  worst  case  when  selecting 
the  presentation  order  for  the  instances  himself. 

With  Kearns  and  Schapire,  Goldman  has  been  investigating  aspects  of  the  PAC  model  [287]. 
They  developed  a  new  technique  for  exactly  identifying  certain  classes  of  Boolean  formulas 
from  random  examples  [124].  (Furthermore,  they  prove  that  these  classes  of  circuits  are 
not  efficiently  learnable  in  the  PAC  model.)  Their  method  is  based  on  the  observation  of 
the  input-output  behavior  of  the  target  formula  on  a  fixed  probability  distribution  which  is 
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determined  by  the  fixed  point  of  the  circuit’s  amplification  function  (defined  as  the  probability 
that  a  1  is  output  when  each  input  is  1  with  probability  p ).  By  demonstrating  that  the 
circuit’s  behavior  is  unstable  in  an  appropriate  sense  on  this  distribution,  they  are  able 
to  infer  all  structural  information  about  the  circuit  (with  high  probability)  by  performing 
various  statistical  tests  on  easily  sampled  variants  of  the  fixed  distribution. 

With  Kearns  and  Schapire,  Goldman  has  also  been  studying  the  complexity  of  weakly  learn¬ 
ing  [125].  Namely,  how  much  data  must  be  collected  from  an  unknown  distribution  in  order 
to  extract  a  small  but  significant  advantage  in  prediction.  Such  a  study  is  motivated  in  part 
by  settings  in  which  there  is  a  limited  supply  of  examples. 

Goldman  will  be  finishing  her  Ph.D.  [123]  in  June,  and  has  accepted  a  faculty  position  at 
Washington  University  in  St.  Louis  beginning  in  August. 

Michelangelo  Grigni 

Grigni  is  a  fourth  year  graduate  student  supervised  by  Sipser.  His  primary  research  involves 
the  construction  of  fast  robust  networks  for  various  kinds  of  communication  problems,  con¬ 
tinuing  work  begun  on  the  broadcasting  problem  with  David  Peleg  [133]  of  the  Weizmann 
Institute. 

With  Sipser,  Grigni  is  developing  a  classification  of  monotone  space  complexity  classes, 
including  monotone  iogspace  and  monotone  bounded-  width  branching  programs.  Indepen¬ 
dently  (and  recently  with  Guibas),  Grigni  is  attempting  to  tighten  the  large  complexity  gaps 
of  various  sum-sorting  problems. 

Mark  D.  Hansen 

Hansen  completed  his  work  in  graph  embeddings  [139]  and  spent  most  of  this  year  working 
with  Leighton  and  Aggarwal  on  computational  geometry  problems  relating  Voronoi  diagrams 
and  query-retrieval  problems  [4].  The  research  which  he  has  done  in  both  of  these  areas 
is  presently  being  incorporated  into  his  Ph.D.  thesis.  Hansen  anticipates  graduating  in 
December  1990. 

In  his  most  recent  work  with  Aggarwal  and  Leighton,  Hansen  describes  a  new  technique  for 
solving  a  variety  of  query-retrieval  problems  in  optimal  time  with  optimal  or  near-optimal 
space.  In  particular,  he  uses  the  technique  to  construct  algorithms  and  data  structures  for 
circular  range  searching,  half-space  range  searching,  and  computing  fc-nearest  neighbors  in  a 
variety  of  metrics.  For  each  problem  and  each  query,  the  response  to  the  query  is  provided 
in  0(k )  or  0(k  +  logn)  time  where  k  is  the  size  of  the  response  and  n  is  the  size  of  the 
problem.  (E.g.,  for  the  n-point  k- nearest  neighbors  problem,  the  fc-nearest  neighbors  of  any 
query  point  are  provided  in  0(fc  +  logn)  steps.)  Depending  on  the  problem  being  solved,  the 
space  required  for  the  data  structure  is  either  linear  or  O(nlogn).  Hence,  the  time  bounds 
are  optimal  and  the  space  bounds  are  optimal  or  near-optimal.  Previously  known  data 
structures  for  these  problems  required  a  factor  of  H(logn(loglogn)2)  or  fI(log  n  log  log  n) 
more  space  and/or  more  time  to  answer  each  query. 

The  compaction  technique  introduced  incorporates  planar  separators,  filtering  search,  and 
the  probabilistic  method  for  discrepancy  problems.  The  fundamental  idea  is  that  fcth-order 
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Voronoi  diagrams  (and  other  suitable  proximity  diagrams)  can  be  compacted  from  k°^n 
space  to  O(n)  space  and  still  retain  all  the  information  that  is  essential  for  solving  query 
problems. 

Alexander  T.  Ishii 

Ishii  generalized  his  VLSI  timing  analysis  algorithms  using  the  notion  of  a  “base  step” 
function  to  encapsulate  assumptions  about  when  signal  values  change  during  the  operation 
of  a  circuit.  He  has  shown  how  various  base  step  functions  can  be  used  to  provide  sufficient 
conditions  for  a  circuit  to  operate  properly.  The  base  step  function  is  used  to  derive  a 
“computational  expansion”  of  the  circuit  from  which  a  collection  of  simple  linear  constraints 
are  derived.  These  constraints  can  be  efficiently  checked  using  standard  graph  algorithms. 
In  addition,  the  algorithm  can  be  adapted  to  determine  the  maximum  frequency  at  which  a 
circuit  can  be  clocked  and  to  produce  the  limiting  critical  path. 

Ishii  and  Leiserson  developed  a  new  base  step  function  which  is  less  pessimistic  than  the 
analogous  ones  used  in  previous  timing  verifiers,  yet  correctly  handles  timing  constraints  that 
are  “cyclic”  or  extend  across  the  boundaries  of  multiple  clock  phases  or  cycles.  If  a  circuit 
is  modeled  as  a  graph  G  =  (V,  E),  where  V  consists  of  components — latches  and  functional 
elements — and  E  represents  intercomponent  connections,  the  new  base  step  function  results 
in  an  algorithm  which  verifies  the  proper  timing  of  a  circuit  in  worst- case  (9(|V||£?|)  time 
and  0(|Vj2)  space  [149]. 

Ishii  has  also  been  working  with  Thomas  Knight  of  the  MIT  Artificial  Intelligence  Labora¬ 
tory  on  a  self-terminating,  digitally-controlled,  and  ECL-compatible  output  pad  driver  for 
high  speed  integrated  circuits.  By  automatically  series-terminating  driven  lines  with  their 
characteristic  impedances,  the  driver  realizes  speed,  power,  and  noise  improvements  over 
conventional  designs.  Series  termination  is  realized  by  exploiting  the  intrinsic  series  resis¬ 
tance  of  the  output  drive  transistors.  The  design  has  not  yet  been  fabricated,  but  simulations 
indicate  that  data-transition  rates  in  excess  of  100MHz  are  possible. 

Lalita  A.  Jategaonkar 

Jategaonkar  finished  her  Master’s  thesis  [151],  which  further  develops  previous  work  with 
John  C.  Mitchell,  a  professor  at  Stanford  University.  The  thesis  was  supervised  by  Albert 
Meyer.  This  research  develops  an  extension  of  the  programming  language  ML  in  which  a 
restricted  object-oriented  style  can  be  achieved.  In  keeping  with  the  framework  of  ML,  a 
type  derivation  system  and  a  type  inference  algorithm  are  presented.  It  is  proved  that  the 
algorithm  is  sound  and  complete  with  respect  to  the  type  derivation  system,  and  that  it 
infers  a  most  general  typing  of  every  typeable  expression  in  the  language.  A  technical  report 
based  on  the  thesis  was  also  published  this  year  [152],  and  Jategaonkar  is  currently  working 
on  a  journal  version  of  this  work. 

Trevor  Jim 

Working  with  Meyer,  Jim  continued  work  on  models  of  the  language  PCF  [249],  a  simply- 
typed  lambda  calculus. 
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He  has  shown  that  the  models  of  Berry  and  Curien  [83]  [52],  based  on  stable  functions  and 
sequential  algorithms,  are  not  restrictions  of  the  usual  Scott  model.  These  models  were  de¬ 
veloped  as  alternatives  to  the  standard  model,  which  contains  troublesome  “non- sequential” 
elements. 

Extending  the  work  of  Bloom  [54],  Jim  has  shown  that  there  is  no  extension  of  PCF  by  SOS 
rules  for  which  Berry’s  stable  function  model  is  fully  abstract.  Further  directions  include 
extending  this  result  to  the  other  models,  and  the  design  of  a  restricted  model  based  on 
Berry’s  bidomains. 

This  research  will  comprise  Jim’s  Master’s  Thesis,  which  he  plans  to  write  over  the  summer. 
Nabil  Kahale 

Kahale  entered  the  department  in  September  1989.  He  achieved  an  earlier  work  on  modular 
properties  of  the  Bell  Numbers.  His  results  will  appear  in  the  Journal  of  Combinatorial 
Theory  Series  A  [155]. 

Kahale  is  now  working  with  Tom  Leighton  on  the  analysis  of  various  routing  protocols  on 
a  mesh  of  arrays  by  applying  new  mathematical  techniques.  He  also  plans  to  work  on  the 
construction  of  graphs  having  good  expansion  properties.  Expanders  are  largely  used  in  the 
design  of  algorithms  and  networks  in  the  field  of  parallel  computation. 

Michael  Kearns 

Kearns  continued  his  research  in  the  area  of  machine  learning.  Together  with  Lenny  Pitt 
of  the  University  of  Illinois,  Kearns  gave  an  algorithm  for  learning  pattern  languages  with 
a  bounded  number  of  variables  from  random  examples  drawn  according  to  an  arbitrary 
product  distribution  over  the  substitution  variables  [157].  Pattern  languages  are  motivated 
by  problems  in  string  matching  and  genetics.  Recent  results  by  other  researchers  indicate 
that  the  pattern  learning  problem  with  an  unbounded  number  of  variables  is  intractable,  so 
the  algorithm  of  Kearns  and  Pitt  is  optimal  in  an  informal  sense. 

Together  with  Sally  Goldman  and  Rob  Schapire,  Kearns  developed  an  algorithm  for  exactly 
identifying  certain  classes  of  circuits  based  on  amplification  functions  [124].  The  central 
idea  is  to  observe  the  input-output  behavior  of  the  unknown  circuit  on  a  simple  probability 
distribution  that  is  unstable,  in  the  sense  that  small  perturbations  of  this  distribution  cause 
noticeable  statistical  changes  in  the  circuit’s  output.  These  techniques  can  be  applied  to 
show  the  existence  of  universal  identification  sequences  for  classes  of  functions,  which  are 
fixed  input  sequences  whose  outputs  suffice  to  exactly  identify  any  circuit  from  the  class. 

Again  with  Goldman  and  Schapire,  Kearns  investigated  the  sample  complexity  of  weak 
learning,  in  which  the  learning  algorithm  needs  only  to  find  an  hypothesis  whose  accuracy  is 
slightly  better  than  random  guessing  [125].  This  model  is  motivated  by  situations  in  which 
examples  are  rare  and  also  by  cryptographic  settings,  where  small  biases  can  greatly  compro¬ 
mise  security.  Results  are  given  that  demonstrate  the  power  of  randomized  hypotheses  in  a 
weak  learning  setting,  and  a  partial  combinatorial  characterization  of  weak  learning  sample 
complexity  is  given. 
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Kearns  and  Schapire  introduced  a  new  model  of  learning  probabilistic  concepts ,  in  which 
each  instance  may  have  some  probability  of  being  positive  and  some  probability  of  being 
negative  [158].  Such  a  model  addresses  settings  as  diverse  as  weather  prediction,  where  one 
is  typically  given  a  probability  of  rain  for  the  day,  and  simple  object  classification,  where  a 
probability  may  best  model  the  degree  to  which  an  object  has  the  desired  property.  In  this 
model,  Kearns  and  Schapire  give  many  provably  efficient  learning  algorithms  and  investigate 
the  underlying  theory  of  learning  probabilistic  concepts.  This  includes  general  techniques 
for  constructing  good  learning  algorithms  and  a  characterization  of  the  sample  size  based  on 
uniform  convergence  results  that  generalizes  the  Vapnik-Chervonenkis  dimension. 

Finally,  Kearns  has  spent  part  of  the  year  making  revisions  to  his  doctoral  dissertation,  The 
Computational  Complexity  of  Machine  Learning,  which  will  be  published  by  The  MIT  Press 
in  September. 

Beginning  in  September  1990,  Kearns  will  be  at  the  International  Computer  Science  Institute 
in  Berkeley,  California. 

Joe  Kilian 

Kilian  has  been  working  on  the  complexity  of  various  cryptographic  protocols.  Fo*  instance, 
given  two  infinitely  powerful  parties  who  have  access  to  an  ideal  commitment  scheme  (en¬ 
velopes),  can  one  party  efficiently  commit  to  a  set  of  bits,  and  give  a  zero-knowledge  proof 
for  an  arbitrary  boolean  predicate  on  these  committed  bits?  Here,  “efficient”  means  using 
a  polynomial  number  of  envelopes.  Surprisingly,  the  answer  is  yes  [38].  This  result  is  used 
to  prove  new  upper  bounds  on  the  communication  complexity  of  certain  secure  distributed 
computations  [38]. 

Joe  also  worked  on  so-called  robust  transformations  of  interactive  proof  systems  [159].  In 
a  robust  transformation,  the  new  prover  is  not  allowed  to  be  substantially  more  powerful 
than  the  original  prover.  He  considered  the  problem  of  how  to  robustly  transform  interactive 
proof  systems  into  zero-knowledge  interactive  proof  systems. 

Joe  worked  on  developing  interactive  proof  systems  with  some  provable  security  properties 
[160].  Finally,  he  found  even  more  ways  of  implementing  oblivious  transfer  in  terms  of 
seemingly  weaker  cryptographic  protocols  [161]. 

Shlomo  Kipnis 

Kipnis  continued  his  investigation  of  bussed  interconnections.  He  is  trying  to  further  explore 
the  power  of  bussed  interconnection  schemes  for  routing  permutations  and  realizing  various 
communication  patterns.  Bussed  interconnection  schemes  and  their  relation  to  difference 
covers  were  explored  by  Joe  Kilian,  Shlomo  Kipnis,  and  Charles  Leiserson  in  [162]. 

Kipnis  also  investigated  priority  arbitration  schemes  that  employ  m  busses  to  arbitrate 
among  n  modules.  His  novel  binomial  arbitration  scheme ,  which  uses  at  most  m  =  lgn  +  1 
busses,  enables  achieving  an  arbitration  time  of  t  =  |lgn  (in  units  of  bus-settling  delay). 
This  scheme  substantially  outperforms  the  commonly  used  binary  arbitration  scheme ,  which 
uses  m  —  lgn  busses  and  achieves  an  arbitration  time  of  t  =  lgn.  Furthermore,  his  general¬ 
ized  binomial  arbitration  scheme  achieves  a  bus-time  tradeoff  of  the  form  m  =  Q(tn 1^t)  be¬ 
tween  the  number  of  arbitration  busses  m  and  the  arbitration  time  t,  for  values  of  1  <  t  <  lg  n 
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and  lg  n  <  m  <  n.  These  new  schemes  can  be  adopted  with  no  changes  to  existing  hardw 
and  protocols;  they  merely  involve  selecting  a  good  set  of  priority  arbitration  codeword 
[167].  Kipnis  also  applied  for  a  patent  on  these  new  arbitration  schemes,  through  the 
Technology  Licensing  Office. 

In  addition,  he  compiled  a  survey  paper  on  the  problem  of  range  queries  in  computational 
geometry.  Range  queries  are  a  fundamental  problem  in  computational  geometry  with  ap¬ 
plications  to  computer  graphics  and  database  retrieval  systems.  The  survey  paper  identifies 
three  general  methods  for  range  queries  in  computational  geometry  and  classifies  many  of 
the  recent  research  results  into  one  or  more  of  these  methods  [168]. 

Michael  Klugerman 

Klugerman  took  courses  in  algorithms,  complexity  theory,  and  probability.  He  TA’d  18.409 
Probabilistic  Methods  in  the  spring,  and  worked  on  routing  problems  on  the  hypercube. 
Also,  he  plans  to  write  a  paper  with  Daniel  Kleitman  and  others  on  a  geometry  problem 
given  to  us  by  Paul  Erdos. 

Dina  Kravets 

Kravets  spent  the  fall  semester  TAing  18.435,  and  continuing  work  with  Alok  Aggarwal  and 
James  Park  on  algorithms  for  totally  monotone  and  Monge  arrays.  Their  algorithms  for 
parallel  searching  in  generalized  Monge  arrays  (together  with  Sandeep  Sen)  will  appear  in 
[6].  During  the  spring  semester,  she  started  working  with  Leo  Guibas  on  some  problems  in 
computational  geometry. 

Arthur  Lent 

Lent  became  a  graduate  student  at  MIT  in  February  1990.  Prior  to  then,  he  was  an  under 
graduate  member  of  TOC.  Working  under  Meyer’s  supervision,  he  completed  his  Bachelor’s 
thesis  [190]  in  January  1990.  The  thesis  presents  a  call-by-name  SECD  machine.  Earlier 
work  on  SECD  machines  [248] [185]  had  focused  on  call-by-value  SECD  machines.  The  ba¬ 
sic  correctness  result  for  the  machine  demonstrates  that  the  operational  behavior  of  this 
machine  correspondes  with  the  operational  behavior  of  PCF  [249]. 

Lent  also  developed  a  set  of  notes  for  a  three  week  unit  on  “Programming  Language  Theory” 
for  use  in  the  undergraduate  class  “Computability,  Programming,  and  Logic.”  These  notes 
gave  a  development  of  the  semantics  of  a  call-by-value  variant  of  PCF,  that  was  an  attempt 
to  capture  the  “Functional  Kernel  of  Scheme  (FKS).”  The  notes  provided  two  operational 
characterizations,  one  via  rewrite  rules,  and  one  via  a  SECD  machine,  and  established  their 
operational  equivalence.  The  notes  culminated  in  a  development  of  a  denotational  semantics 
for  FKS.  The  denotational  semantics  were  based  on  Plotkin’s  partial  function  model.  Lent 
and  Riecke  jointly  worked  to  establish  a  direct  proof  of  adequacy  for  this  model. 

Lent  also  became  interested  in  intuitionistic  logic,  and  its  applications  to  computer  science. 
His  reading  in  intuitionistic  logic  has  focused  mainly  on  current  efforts  to  explore  notion  of 
“Feasibly  Constructive  Arithmetic,”  which  is  an  attempt  to  genern'ize  the  notion  ot  polytio 
mial  time  to  higher  type.  One  goal  of  this  exploration  would  be  to  extract  polynomial  time 
algorithms  from  feasibly  constructive  proofs, 
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Leonid  Levin 

Comparing  VLSI  models  [150]  shows  when  the  effect  of  wires’  speed  overtakes  any  effect  of 
wires’  width  and  gives  the  best  possible  (slightly  superlinear)  threshold  at  which  all  effects 
of  wires’  width  vanish. 

[150]  shows  the  unexpected  equivalence  of  average  case  NP-completeness  in  two  classes  of 
distributions:  P-time  computable  and  P-time  samplable. 

Random  Linear  predicates  R(x )  of  inputs  x  of  one-way  functions  /  were  previously  shown 
(in  [Goldreich  Levin  89])  to  be  unpredictable  from  f(x),R.  Their  security  was  equal  to  the 
security  of  /  within  a  constant  power.  [194]  makes  this  result  tight:  constant  power  is  just 
1  and  the  security  loss  is  only  a  factor  of  [zj0^). 

Bruce  Maggs 

Maggs  is  studying  algorithms  for  routing  packets  in  switching  networks.  With  Tom  Leighton, 
he  developed  a  multibutterfly  routing  algorithm  that  is  both  robust  against  faults  and  effi¬ 
cient  from  a  practical  point  of  view.  For  example,  on  an  JV-input  multibutterfly  with  k  faulty 
switches,  the  algorithm  can  route  any  permutation  between  some  set  of  N  —  O(k)  inputs  and 
N  —  O(fc)  outputs  in  O(logiV)  time.  Bruce  Maggs,  Sanjeev  Arora,  and  Tom  Leighton  also 
designed  the  first  efficient  on-line  algorithm  for  path  selection  in  an  optimal-size  nonblock¬ 
ing  network.  Viewed  in  a  telephone  switching  context,  the  algorithm  can  put  through  any 
sequence  of  calls  among  N  parties  on  a  network  of  size  0(N  log  N)  with  O(log  AQ-bit-step 
delay  per  call,  even  if  many  calls  are  made  simultaneously.  Finally,  Bill  Aiello,  Tom  Leighton, 
Bruce  Maggs,  and  Mark  Newman  discovered  a  randomized  0(log  W)-bit-step  algorithm  for 
bit-serial  message  routing  on  a  hypercube.  The  result  is  asymptotically  optimal,  and  im¬ 
proves  upon  the  best  previously  known  algorithms  by  a  logarithmic  factor.  The  algorithm  is 
adaptive,  and  by  generalizing  the  Borodin- Hopcroft  lower  bound  on  oblivious  routing,  they 
show  that  any  other  0(log  AT)-bit-step  algorithm  must  be  adaptive  too. 

Yishay  Mansour 

Mansour’s  interests  include  distributed  communication  networks,  machine  learning,  and  com¬ 
plexity. 

In  the  area  of  machine  learning,  he  continues  to  investigate  the  possibilities  of  achieving 
learnability  by  considering  the  Fourier  transform.  In  [215],  he  describes  various  learning 
algorithms  in  this  spirit. 

In  the  area  of  distributed  computing,  he  is  mainly  interested  in  routing  algorithms.  In  the 
wcrk  with  Cidon,  Kutten,  and  Peleg  [74],  they  investigate  simple  routing  strategies.  In  the 
work  with  Schulman  [218],  they  prove  a  tradeoff  between  time  and  space  for  sorting  on  a 
ring. 

In  a  work  with  Nisan  and  Tiwari  [216],  they  investigate  the  computational  complexity  of 
universal  hash  functions.  They  are  able  to  establish  a  time  space  tradeoff  for  any  implemen¬ 
tation  of  such  functions. 
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Mojdeh  Mohtashemi 

Mojdeh  entered  the  department  in  September  1989.  During  fall  1989,  she  was  a  teaching 
assistant  for  the  undergraduate  course  in  Structure  and  Interpretation  of  Computer  Programs 
taught  by  Gerald  Sussman.  Mojdeh  spent  most  of  the  year  on  coursework  and  readings  in 
cryptography. 

Carolyn  H.  Norton 

Norton,  working  with  6va  Tardos  (Cornell)  and  Serge  Plotkin  (Stanford)  [234] [233],  de¬ 
veloped  a  strongly  polynomial  algorithm  to  solve  finite  dimensional  linear  programs,  when 
feasible  space  is  not  given  explicitly  by  a  set  of  inequalities,  but  is  instead  given  by  a  sepa¬ 
ration  algorithm. 

She  is  currently  working  with  Dimitris  Bertsimas  on  studying  the  relation  between  integer 
programs  and  their  linear  programming  relaxations. 

Rafail  Ostrovsky 

Ostrovsky  is  working  on  feasibility  results  concerning  the  implementation  of  secure  com¬ 
putation  and  proof  systems  in  insecure  communication  environments.  More  specifically,  he 
explores  two  basic  questions:  the  first  is  basing  security  primitives  on  assumptions  as  general 
as  possible  and  making  connections  between  these  primitives,  and  the  second  is  investigating 
various  models  of  computation  in  which  secure  computation  can  be  implemented. 

Mihir  Bellare,  Silvio  Micali,  and  Rafail  Ostrovsky  show  how  the  number  of  rounds  of  inter¬ 
action  can  be  reduced  (to  constant)  for  Perfect  2ero- knowledge  interactive  proofs  for  Graph 
Isomorphism  and  Quadratic  Residuosity  (more  generally,  for  any  random  self-reducible  prob¬ 
lem)  [45).  In  [46],  they  show  how  a  statistical  zero-knowledge  protocol  for  an  honest  verifier 
can  be  compiled  into  a  statistical  zero-knowledge  protocol  which  works  for  any  (even  cheat¬ 
ing)  verifier.  They  examine  the  power  of  the  Prover  for  giving  a  Zero-Knowledge  proof;  they 
compare  the  “black-box”  definition  zero-knowledge  to  the  standard  definition;  and  they  ex¬ 
amine  if  zero-knowledge  property  can  be  retained  if  Prover  wants  to  convince  verifier  with 
probability  one  [46]. 

Joe  Kilian,  Silvio  Micali,  and  Rafail  Ostrovsky  show  how  zero-knowledge  protocols  for  NP 
can  be  made  non-interactive,  assuming  a  pre-processing  Oblivious  Transfer  protocol  [164]. 

In  [239],  Ostrovsky  presents  the  first  poly-logarithmic  simulation  of  an  arbitrary  computation 
on  probabilistic  Oblivious  RAM. 

Ostrovsky,  Venkatesan,  and  Yung  [240]  examine  the  model  of  two-party  partial-information 
games,  when  one  of  the  players  is  infinitely-powerful,  while  the  other  is  polynomially  bounded. 
They  show  that  any  such  game  is  playable,  given  any  one-way  function  and  that  there  exists 
a  bit-commitment  protocol  from  an  infinitely-powerful  player  to  a  polynomial  player,  unless 
there  is  no  hard  on  average  problem  in  PSPACE. 

Marios  C.  Papaefthymiou 

Papaefthymiou  has  been  investigating  parallel  architectures  under  the  supervision  of  C.  E.  Leis- 
erson. 
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His  research  explores  both  theoretical  and  practical  aspects  of  synchronous  circuitry  opti¬ 
mization.  A  general  framework  for  this  problem  has  been  given  by  C.  E.  Leiserson  and 
J.  Saxe  [188].  Papaefthymiou  demonstrated  the  closed- semiring  structure  of  the  retiming 
operation  on  unit-delay  circuits.  He  also  gave  a  concise  mathematical  characterization  of  the 
minimum  clock  period  of  a  clocked  circuit  in  terms  of  the  minimum  register-to-delay  ratio 
cycle  in  the  system’s  graph  representation.  This  result  led  to  improved  algorithms  for  retim¬ 
ing  of  circuits  with  maximum  delay  D:  an  0(VX^ E\g  VWlg  VD)  algorithm  for  retiming 
with  clock  period  that  does  not  exceed  the  minimum  by  more  than  D,  and  an  0(V Elg  D) 
algorithm  for  minimum  clock  period  retiming. 

Recently,  Marios  Papaefthymiou  with  C.E.  Leiserson  have  been  investigating  efficient  algo¬ 
rithms  for  mixed  integer  optimization  problems. 

James  K.  Park 

Park  spent  the  last  year  working  with  Alok  Aggarwal  (IBM  Yorktown  Heights),  Dina  Kravets, 
and  Sandeep  Sen  (Duke  University)  on  a  number  of  problems  relating  to  Monge  arrays.  An 
m  x  n  array  A  —  {a[i,j]}  is  called  Monge  if  for  1  <  i  <  m  and  1  <  j  <  n,  a[i,j]  +  a[i  + 
1  ,j  +  1]  <  a[i,j  +  1]  +  a[i  +  Monge  arrays  were  introduced  in  1987  by  Aggarwal,  Klawe, 
Moran,  Shor,  and  Wilber  [5],  who  showed  that  several  problems  in  computational  geometry 
and  VLSI  theory  could  be  reduced  to  the  problem  of  finding  the  maximum  entry  in  each  row 
of  a  Monge  array.  Since  this  seminal  paper,  Monge  arrays  have  been  studied  by  a  number 
of  researchers,  and  many  additional  applications  of  the  Monge  array  abstraction  have  been 
developed. 

Park’s  most  recent  work  with  Monge  arrays  falls  into  three  categories.  First,  Aggarwal  and 
Park  have  been  studying  the  use  of  Monge  arrays  in  solving  economic  lot-size  problems, 
an  important  class  of  problems  from  operations  research  [8] [7] .  Second,  Aggarwal,  Kravets, 
Park,  and  Sen  have  been  developing  parallel  algorithms — specifically  PRAM  and  hypercube 
algorithms  -  for  searching  in  Monge  arrays  and  generalized  Monge  arrays  [6].  Third,  Kravets 
and  Park  have  been  investigating  several  selection  and  sorting  problems  in  the  context  of 
Monge  arrays  [178]. 

In  the  coming  year,  Park  plans  to  complete  his  doctoral  thesis,  a  study  of  Monge  arrays  and 
their  applications.  He  also  plans  to  begin  work  with  Aggarwal  and  Marie  Klawe  (University 
of  British  Columbia)  on  a  monograph  on  the  subject  of  Monge  arrays. 

Greg  Plaxton 

Plaxton  is  working  on  the  development  of  algorithms  for  sorting  on  “cube-type”  networks. 
The  class  of  cube-type  networks  includes  the  hypercube,  butterfly,  cube-connected  cycles 
and  .shuffle  exchange.  Note  that  a  special  case  of  sorting  is  routing,  which  is  arguably  the 
most,  important,  primitive  operation  required  by  any  large  scale  parallel  computer.  With 
Robert  Cypher  (IBM  Almaden),  Plaxton  has  devised  a  deterministic  algorithm  for  sorting  on 
cube-type  networks  that  runs  in  0(log n(log log n)2)  time.  The  best  previous  deterministic 
algorithm  for  the  sorting  problem  is  due  to  Batcher,  and  runs  in  0(log3n)  time  [35].  If 
a  certain  amount  of  preprocessing  is  allowed,  the  running  time  of  Cypher  and  Plaxton’s 
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algorithm  can  be  reduced  to  O(lognloglogn).  The  best  known  lower  bound  for  sorting  on 
cube-type  networks  remains  fl(logn)  (a  trivial  bound). 

More  recently,  Greg  Plaxton  and  Tom  Leighton  have  been  investigating  randomized  algo¬ 
rithms  for  sorting  on  cube-type  networks.  The  results  follow  from  considering  a  particular 
tournament  based  on  the  butterfly,  and  demonstrating  that  the  tournament  possesses  a 
strong  ranking  property.  The  ranking  property  of  this  tournament  is  exploited  by  using  it  as 
a  building  block  for  efficient  parallel  sorting  algorithms  under  a  variety  of  different  models 
of  computation.  Three  important  applications  are  provided.  First,  a  sorting  circuit  of  depth 
7.5  log  n  is  defined  that  sorts  all  but  a  superpolynomially  small  fraction  of  the  n!  possible 
input  permutations.  Second,  a  randomized  sorting  algorithm  is  given  for  cube-type  networks 
that  runs  in  O(logn)  word  steps  with  very  high  probability.  This  algorithm  has  a  signifi¬ 
cantly  smaller  probability  of  failure  than  any  previous  randomized  algorithm  for  sorting  on 
cube-type  networks  in  the  word  model.  Third,  a  randomized  algorithm  is  given  for  sorting  n 
0(m)-bit  records  on  an  nlogn  node  butterfly  that  runs  in  0(m  +logn)  bit  steps  with  very 
high  probability.  All  previously  known  algorithms  for  sorting  on  cube-type  networks  require 
fi(log2  n)  bit  steps. 

Jon  G.  Riecke 

Riecke  has  pursued  the  theory  of  functional  languages.  The  research  focused  on  moving  the 
existing  theory  towards  more  realistic  programming  languages. 

In  joint  work  with  Stavros  Cosmadakis  and  Albert  Meyer,  Riecke  investigated  the  theory 
of  “lazy”  functional  languages  [82].  The  term  “lazy”  applies  to  functional  languages  which 
pass  arguments  call-by-name  but  which  halt  at  functional  abstractions.  Almost  all  call-by¬ 
name  languages  exhibit  this  lazy  behavior,  but  the  standard  theory  of  functional  languages 
does  not  adequately  explain  it.  Building  on  the  denotational  semantics  work  of  Abramsky 
[1],  Ong  [237] [238] ,  Bloom  and  Riecke  [59],  and  Cosmadakis  [81],  the  three  have  found  a 
complete  and  decidable  logic  for  proving  computationally  valid  equations  (“observational 
congruences”)  between  a  certain  class  of  programs.  This  logic  forms  the  basis  of  reasoning 
about  code  in  most  lazy  functional  languages.  Complexity-theoretic  issues  in  the  logic  will 
be  addressed  in  a  forthcoming  paper  by  Cosmadakis  and  Riecke. 

Using  the  work  in  lazy  languages  as  a  basis,  Riecke  also  examined  the  logic  of  call-by-value 
languages  [254].  Call-by- value  languages  form  the  vast  majority  of  functional  languages 
e.g.,  Lisp,  SCHEME,  and  ML),  so  this  case,  like  the  lazy  case,  is  important  from  a  practical 
point  of  view.  The  discovery  of  a  complete  and  decidable  proof  system  for  a  limited  class  of 
call-by- value  equations  is  the  main  contribution  of  this  work. 

Riecke  has  also  been  working  on  connections  between  the  operational  and  denotational  se¬ 
mantics  of  different  programming  languages.  Specifically,  he  looked  at  the  problem  of  finding 
translations  from  one  programming  language  to  another.  A  general  recursion-theoretic  ar¬ 
gument  shows  that  almost  any  programming  language  can  simulate  another.  The  problem 
lies  in  finding  tasteful  translations  that  do  not  rely  on  the  computational  power  of  the  pro¬ 
gramming  language.  Translations  from  call-by-value  to  lazy  languages,  for  example,  have 
been  found,  as  well  as  negative  theorems  demonstrating  the  impossibility  of  finding  well- 
structured  translations  from  lazy  to  call-by- value.  The  work  will  appear  in  a  forthcoming 
paper. 
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Phillip  Rogaway 

The  problem  of  secure  distributed  function  evaluation  entails  a  network  of  processors  endeav¬ 
oring  to  compute  some  function  on  privately  held  inputs,  but  in  a  manner  which  preserves 
the  privacy  of  these  inputs.  Working  with  Silvio  Micali,  Rogaway  showed  how  to  accomplish 
this  goal  using  only  a  constant  number  of  rounds  of  interaction  [40].  This  vastly  improves 
the  efficiency  of  [127]  and  related  protocols  for  secure  computation. 

Rogaway  investigated  the  possibility  of  constant  round,  information  theoretically  secure  com¬ 
putation.  In  [38],  it  is  shown  that  the  class  of  functions  computable  in  this  way  is  quite  rich. 

Foundational  issues  in  secure  computation  have,  so  far,  lagged  behind  the  field’s  accomplish¬ 
ments.  In  [166],  careful  definitions  for  correct  and  private  computation  under  a  dynamic 
adversary  are  developed. 

This  summer,  Rogaway  will  be  working  on  writing  up  his  thesis. 

John  Rompel 

This  year  Rompel  continued  his  research  on  low-independence  randomness.  This  research 
was  along  two  main  lines:  the  use  of  low-independence  distributions  to  remove  randomness 
from  parallel  algorithms  and  the  use  of  low  independence  hash  functions  in  cryptographic 
applications. 

Rompel,  together  with  Berger  [49],  developed  a  general  framework  for  removing  randomness 
from  randomized  NC  algorithms  whose  analysis  uses  only  polylogarithmic  independence. 
Previously,  no  techniques  were  known  to  determinize  those  RNC  algorithms  depending  on 
more  than  constant  independence.  One  application  of  their  techniques  is  an  NC  algorithm 
for  the  set  discrepancy  problem,  which  can  be  used  to  obtain  many  other  NC  algorithms, 
including  a  better  NC  edge  coloring  algorithm.  As  another  application  of  their  techniques, 
they  provided  an  NC  algorithm  for  a  hypergraph  coloring  problem. 

Rompel,  working  with  Berger  and  Peter  Shor  [50],  gave  NC  approximation  algorithms  for 
the  unweighted  and  weighted  set  cover  problems.  Their  algorithms  use  a  linear  number 
of  processors  and  give  a  cover  that  has  at  most  logn  times  the  optimal  size/weight,  thus 
matching  the  performance  of  the  best  sequential  algorithms.  Previously,  there  were  no  known 
parallel  algorithms  for  the  general  set  cover  problem.  Berger,  Rompel,  and  Shor  devised  a 
randomized  algorithm,  depending  on  only  pairwise  independence,  and  then  converted  it  to 
a  deterministic  one.  Furthermore,  they  applied  their  set  cover  algorithm  to  learning  theory, 
giving  an  NC  algorithm  to  learn  the  concept  class  obtained  by  taking  the  closure  under 
finite  union  or  finite  intersection  of  any  concept  class  of  finite  VC-dimension  which  has  an 
NC  hypothesis  finder.  In  addition,  they  gave  a  linear-processor  NC  algorithm  for  a  variant 
of  the  set  cover  problem  first  proposed  by  Chazelle  and  Friedman,  and  used  it  to  obtain  NC 
algorithms  for  several  problems  in  computational  geometry. 

The  other  line  of  Rompel’s  research  concerns  the  cryptographic  applications  of  low  indepen¬ 
dence  hash  functions.  The  first  such  application  he  considered  was  that  of  digital  signatures. 
He  showed  in  [255]  that  secure  signature  schemes  could  be  constructed  from  any  one-way 
function.  This  improved  the  best  previous  result,  due  to  Naor  and  Yung,  that  one-way  per¬ 
mutations  suffice  to  construct  signatures.  Furthermore,  his  result  is  optimal,  since  it  was 
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known  that  one-way  functions  were  necessary  for  signatures.  The  construction  starts  with  an 
arbitrary  one-way  function  and,  after  a  series  of  six  steps,  ends  with  a  one-way  hash  function 
(which  Merkle  showed  can  be  used  to  construct  signatures).  Most  of  the  steps  are  based  on 
the  use  of  low  independence  hash  functions  to  alter  the  probability  distribution  induced  by 
a  function. 

Another  application  of  low  independence  hash  functions,  explored  in  joint  work  with  Bellare 
and  Goldwasser,  is  amplifying  the  error  probability  of  interactive  proofs  without  increasing 
the  randomness  required  [42].  For  this  construction,  low  independence  hash  functions  are 
used  to  randomly  sample  a  function.  By  taking  successively  smaller  samples  using  succes¬ 
sively  larger  independence,  they  are  able  to  exploit  a  tradeoff,  keeping  the  error  probability 
and  number  of  random  bits  constant. 

Rompel  is  currently  writing  his  Ph.D.  thesis  based  on  his  work  on  low  independence  random¬ 
ness,  which  he  plans  to  finish  in  August.  After  that,  he  plans  to  do  postdoctoral  research, 
under  an  NSF  Fellowship,  at  the  International  Computer  Science  Institute  in  Berkeley,  CA. 

Arie  Rudich 

Rudich  has  been  working  with  Meyer  on  semantics  for  dataflow  networks,  Specifically,  he  is 
studying  various  notions  of  observing  “completion.”  Rudich  finished  his  Master’s  thesis  this 
year.  The  thesis  investigates  a  notion  of  “completion,”  based  on  observing  fair  computations. 
It  presents  a  semantics  for  nondeterministic  dataflow  networks  which  is  fully  abstract  with 
respect  to  observing  finite  input-output  relations  of  fair  computations.  This  semantics  thus 
reflects  both  safety  and  liveness  properties  of  network  computations  with  respect  to  finite 
observations.  Prior  models  [172] [242][210] [153][274]  required  observing  infinite  behaviors  to 
accommodate  liveness  properties. 

Rudich  also  studied  algorithms  for  inference  of  finite  automata,  trying  to  extend  the  ideas 
to  infer  nondeterministic  and  infinite  automata. 

Robert  E.  Schapire 

Schapire  continued  to  work  on  problems  relevant  to  the  distribution-free  (“PAC”)  learning 
model  introduced  by  Valiant  [287].  In  particular,  he  considered  the  problem  of  improving 
the  accuracy  of  a  hypothesis  output  by  a  learning  algorithm  in  this  model,  and  has  shown 
that  a  model  of  learnability,  called  weak  learnability,  in  which  the  learner  is  only  required  to 
perform  slightly  better  than  random  guessing  is  as  strong  as  a  model  in  which  the  learner’s 
error  can  be  made  arbitrarily  small  [260].  His  result  may  have  significant  applications  as  a 
tool  for  efficiently  converting  a  mediocre  learning  algorithm  into  one  that  performs  extremely 
well.  His  result  also  has  some  unexpected  theoretical  consequences  relating  to  the  general 
complexity  of  learning  in  the  Valiant  model. 

Schapire  also  looked  at  the  problem  of  learning  pattern  languages ,  a  simple  class  of  lan¬ 
guages  introduced  by  Angluin  [12] .  He  was  able  to  show  in  a  very  strong  sense  that  such 
languages  are  unlearnable  in  the  PAC  model  (assuming  P/poly  ^  NP/poly),  regardless  of 
the  representation  used  by  the  learning  algorithm  [261]. 

With  Goldman  and  Kearns,  Schapire  has  been  investigating  other  aspects  of  the  PAC  model. 
They  developed  a  new  technique  for  learning  certain  functions  under  particular  fixed  but 
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simple  distributions  by  observing  the  statistical  behavior  of  the  function  under  simple  per¬ 
turbations  cf  the  fixed  distribution  [124].  They  have  also  been  studying  the  complexity  of 
weakly  learning,  that  is,  of  obtaining  a  small  but  significant  advantage  in  prediction  [125]. 
Such  a  study  may  be  especially  relevant  in  settings  in  which  the  supply  of  examples  is  severely 
limited. 

With  Kearns,  Schapire  has  been  exploring  a  new  and  important  extension  to  the  Valiant 
model,  namely  to  the  problem  of  learning  concepts  that  may  exhibit  uncertain  or  proba¬ 
bilistic  behavior  on  some  examples  [158].  Such  probabilistic  concepts  arise  naturally  in  many 
situations,  such  as  weather  prediction,  where  the  measured  variables  and  their  accuracy  are 
insufficient  to  determine  the  outcome  with  certainty.  While  building  on  the  recent  results 
of  Haussler  [142]  on  the  sample  complexity  of  learning  in  probabilistic  settings,  Kearns  and 
Schapire  focus  primarily  on  the  design  of  efficient  algorithms  for  learning  probabilistic  con¬ 
cepts.  Their  work  also  extends  many  of  the  results  in  the  standard  PAC  model  to  the  new 
probabilistic  model. 

Finally,  Schapire  has  been  working  with  Goldman  and  Hi  vest  on  the  problem  of  inferring  a 
binary  relation  between  n  objects  of  one  kind  and  m  of  another  [126].  This  can  be  viewed 
as  the  problem  of  inferring  an  n  x  m  binary  matrix.  Their  goal  has  been  to  minimize  the 
number  of  prediction  mistakes  made  by  a  learner  presented  with  such  a  matrix  one  entry  at 
a  time.  They  have  been  able  to  prove  numerous  upper  and  lower  mistake  bounds  for  several 
variations  of  this  problem. 

Leonard  Schulman 

In  the  summer  and  fall  of  1989,  Schulman  worked  with  W.  Goddard  on  general  sorting 
problems.  In  these  problems,  only  partial  information  is  required  about  the  order  relations 
among  given  elements;  the  goal  is  to  obtain  this  information  with  a  minimum  of  comparisons. 
Their  research  will  be  presented,  along  with  the  work  of  V.  King,  at  STOC  1990  [119]. 

In  the  fall  of  1989,  work  of  Schulman  and  Mansour  on  the  problem  of  sorting  in  parallel,  on 
a  ring  of  processors,  was  accepted  for  publication  in  Journal  of  Algorithms  [217]. 

Currently,  Schulman  is  working  on  problems  in  communication  and  circuit  complexity.  He 
is  guided  in  this  work  by  his  advisor  M.  Sipser,  and  by  M.  Karchmer.  He  intends  to  continue 
this  work  during  the  summer  of  1990. 

In  other  work,  Schulman,  with  D.  Kleitman,  M.  Klugerman,  W.  Goddard,  and  others,  has 
been  investigating  some  problems  in  combinatorial  geometry. 

Eric  J.  Schwabe 

This  year,  Schwabe  continued  his  study  of  the  butterfly  network  vs.  the  shuffle-exchange 
graph  as  interconnection  networks  for  parallel  computation.  Although  these  well-known 
hypercube-derived  networks  share  many  characteristics,  the  question  of  their  relative  com¬ 
putational  strength  has  remained  open. 

Earlier  work  of  his  narrowed  the  gap  between  the  two  networks  by  showing  that  a  class  of 
structured  hypercube  algorithms  which  could  be  simulated  efficiently  on  the  shuffle-exchange 
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graph  could  in  fact  be  simulated  on  the  butterfly  just  as  efficiently.  This  result  was  recently 
submitted  for  publication  [262]. 

Schwabe  resolved  the  long-standing  open  question  of  the  relative  computational  strength  of 
these  two  networks  by  proving  that  any  T-  step  computation  on  an  iV-node  butterfly  can 
be  simulated  in  O(T)  steps  on  an  iV-node  shuffle- exchange  graph,  and  vice  versa  [263].  This 
result  established  the  computational  equivalence  of  many  common  hypercube-derived  net¬ 
works,  and  in  addition  yielded  the  first  constant-slowdown  simulation  of  the  shuffle-exchange 
graph  on  the  hypercube. 

Over  the  next  year,  he  plans  to  work  with  Koch,  Leighton,  Maggs,  Rao,  and  Rosenberg 
[171]  on  a  joint  journal  submission  of  related  results  on  network  emulations,  while  contin¬ 
uing  to  study  problems  in  parallel  computation  on  hypercube-derived  networks.  He  also 
hopes  to  return  to  earlier  unfinished  work  concerning  networks  for  efficient  parallel  memory 
management. 

Mark  Smith 

Smith  entered  the  department  in  September  1989,  and  he  spent  most  of  his  time  on  course- 
work.  He  plans  to  do  readings  in  Circuit  Complexity  under  the  supervision  of  Mauricio 
Karchmer  this  summer. 

Clifford  Stein 

Stein  has  been  working  on  developing  sequential  and  parallel  algorithms  for  combinatorial 
optimization  problems.  In  August,  he  completed  his  Master’s  thesis.  The  first  result  in  the 
thesis  is  a  parallel  algorithm  to  find  a  maximal  set  of  edge-disjoint  cycles  in  an  undirected 
graph  in  O(logn)  time  using  m  processors  on  a  CRCW  PRAM.  He  then  uses  as  a  primitive 
for  finding  a  cycle  cover  containing  0(m+n  log  n)  edges  using  O(log2  n)  time  on  m  processors. 
The  thesis  also  contains  a  result  giving  RNC  algorithms  for  the  assignment  problem  which 
use  a  number  of  processors  independent  of  the  size  of  the  largest  number  in  the  problem. 
Stein  has  been  working  on  submitting  these  results  for  publication  [275] [170]  [169] . 

Together  with  Philip  Klein  of  Brown  and  Eva  Tardos  of  Cornell,  Stein  developed  new  efficient 
approximation  algorithms  for  the  concurrent  multicommodity  flow  problem.  Besides  being 
an  important  problem  in  its  own  right,  the  concurrent  flow  problem  has  many  interesting 
applications.  Leighton  and  Rao  used  concurrent  flow  to  find  an  approximately  “sparsest 
cut”  in  a  graph,  and  thereby  approximately  solve  a  wide  variety  of  graph  problems,  in¬ 
cluding  minimum  feedback  arc  set,  minimum  cut  linear  arrangement,  and  minimum  area 
layout.  Raghavan  and  Thompson  used  concurrent  flow  to  approximately  solve  a  channel 
width  minimization  problem  in  VLSI.  Klein,  Stein,  and  Tardos  give  a  fully  polynomial  ap¬ 
proximation  scheme  for  concurrent  flow.  Their  algorithm  is  simple,  and  as  a  corollary,  they 
get  0(m2  logm)  time  algorithm  to  find  an  approximately  sparsest  cut  in  an  m-edge  graph, 
and  an  0(kl  s(m.  4-  nlogn))  time  algorithm  to  find  an  approximation  to  the  channel  width 
minimization  problem  in  an  n-node,  m-edge,  ^-channel  graph. 

Larry  J.  Stockmeyer 

Stockmeyer  has  been  working  with  Hagit  Attiya,  Cynthia  Dwork,  and  Nancy  Lynch  on  the 
time  to  reach  agreement  in  a  timing-based  model,  where  there  is  uncertainty  in  message 
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delivery  time  and  processor  speeds,  and  where  processors  may  fail  by  stopping.  Upper  and 
lower  bounds,  tight  to  within  a  factor  of  2,  have  been  obtained.  Stockmeyer  is  a  member  of 
the  program  committee  for  the  1990  FOCS  Conference. 

David  Wald 

Wald  came  to  MIT  as  a  graduate  student  in  September  1989.  He  is  working  with  Meyer, 
examining  calculi  for  describing  concurrent  processes. 

Joel  Wein 

Wein  continued  to  work  on  several  aspects  of  parallel  computation  and  combinatorial  opti¬ 
mization.  He  extended  his  results  on  Las  Vegas  RNC  algorithms,  reported  on  in  the  last 
progress  report,  to  the  case  of  planar  multicommodity  flows  [291].  Working  with  Leighton 
and  Shmoys,  he  developed  randomized  approximation  algorithms  for  the  problem  of  job  shop 
scheduling,  that  achieve  schedules  that  are  O(log2  M )  worse  than  optimal,  where  M  is  the 
number  of  machines.  The  best  previously  known  deterministic  polynomial  time  algorithm 
was  an  0(M )  approximation. 

In  the  area  of  practical  parallel  algorithms,  with  Zenios  of  the  University  of  Pennsylvania, 
he  developed  an  improved  Connection  Machine  algorithm  for  the  assignment  problem.  This 
algorithm,  a  modification  of  Bertsekas’  auction  algorithm,  exploits  two  different  levels  of 
parallelism  and  an  efficient  method  of  communicating  the  data  between  them  that  avoids 
the  need  to  use  the  router,  thus  yielding  significant  speedups  over  previous  implementations 
[292]. 

David  Williamson 

Williamson  spent  the  past  year  investigating  approximation  algorithms  for  several  different 
NP-complete  problems. 

The  main  focus  of  his  work  has  been  the  Held- Karp  heuristic  for  the  Traveling  Salesman 
Problem  (TSP)  [143].  The  heuristic  computes  a  lower  bound  on  the  cost  of  the  optimal 
tour,  a  bound  which  in  practice  seems  to  be  very  tight.  Previously,  Shmoys  and  Williamson 
[270]  shown  that  the  Held-Karp  heuristic  on  the  symmetric  TSP  with  triangle  inequality 
has  a  certain  monotonicity  property;  namely,  the  bound  produced  by  the  heuristic  for  a 
subset  of  a  particular  input  is  no  greater  than  that  produced  for  the  input  itself.  This 
past  year,  Williamson  extended  these  results  to  show  that  the  monotonicity  property  also 
holds  for  the  asymmetric  case  with  triangle  inequality,  and  that  this  property  implies  that 
the  Held-Karp  heuristic  produces  a  value  that  is  no  less  than  l/[logn]  times  the  cost  of 
the  optimal  tour  in  the  asymmetric  case  with  triangle  inequality.  He  also  showed  that  in 
the  asymmetric  case,  the  Held-Karp  heuristic  produces  a  value  that  always  is  at  least  the 
value  given  by  another  lower-bound  heuristic  for  the  TSP  due  to  Balas  and  Christofides  [33]. 
Additionally,  he  demonstrated  that  there  are  a  number  of  other  equivalent  formulations  of 
the  Held-Karp  heuristic  in  addition  to  the  ones  shown  by  Held  and  Karp  in  their  original 
paper.  For  instance,  in  the  symmetric  case  with  triangle  inequality  and  non-negative  edge 
costs,  the  value  of  the  Held-Karp  heuristic  is  equal  to  the  value  of  the  linear  relaxation  of 
the  minimum-cost  biconnected-graph  problem.  Finally,  he  showed  that  in  the  Euclidean 
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case,  solutions  produced  by  the  Held-Karp  heuristic  are  planar.  These  results  are  collected 
in  Williamson’s  Master’s  thesis  [295],  which  he  completed  this  spring. 

Williamson  also  worked  on  flow-shop  scheduling,  another  NP-complete  problem.  A  common 
approach  to  approximating  the  best  flow-shop  schedule  is  to  find  a  good  permutation  of  jobs. 
Potts,  Shmoys,  and  Williamson  [250]  found  a  family  of  instances  for  which  this  approach  does 
poorly.  In  particular,  they  showed  that  for  these  instances  the  best  possible  permutation 
schedule  is  a  factor  of  y/rn/ 2  worse  than  the  overall  best  schedule,  where  m  is  the  number 
of  machines  in  the  instance. 

Finally,  Williamson  considered  a  problem  from  learning  theory  called  the  minimum  consistent 
subset  problem  under  the  nearest-neighbor  rule.  Given  a  collection  T  of  points  with  labels,  a 
consistent  subset  S  C  T  is  one  that  correctly  classifies  all  the  points  of  T  under  the  nearest- 
neighbor  rule.  Williamson  showed  that  finding  the  smallest  such  set  S  is  NP-complete, 
and  also  that  it  seems  unlikely  that  there  is  any  approximation  algorithm  that  can  find  a 
consistent  subset  of  size  within  any  constant  factor  of  the  smallest  such  subset. 

Yiqun  Yin 

This  year  Yin  spent  most  of  her  time  on  course  work.  She  is  currently  interested  in  the 
theoretical  aspect  of  machine  learning,  and  to  do  some  research  in  the  bandit  problems 
during  summer. 
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at  CRYPTO  89,  August  1989. 

[34]  S.  Micali.  How  to  collectively  flip  a  coin.  Lecture  given  at  Scuola  Normale  Superiore 
(July);  CRYPTO  89,  Santa  Barbara,  CA  (August),  1989. 

[35]  S.  Micali.  On-line/off-line  digital  signatures.  Lecture  given  at  CRYPTO  89,  August 
1989. 

[36]  S.  Micali.  An  optimal  algorithm  for  Byzantine  agreement  from  scratch.  Lecture  given 
at  16'^  International  Colloquium  on  Automata,  Languages  and  Programming,  July 
1989.  Also  given  at  Technion  (January);  Boston  University  (spring);  Carnegie  Mellon 
University  (October),  1989. 

[37]  S.  Micali.  Perfect  pseudo-random  number  generation.  Lecture  given  at  lG^  Interna¬ 
tional  Colloquium  on  Automata,  Languages  and  Programming,  July  1989. 

[38]  S.  Micali.  Perfect  zero-knowledge,  constant-round  p  Mof  for  graph  isomorphism.  Lecture 
given  at  University  of  Toronto,  June  1989. 

[39]  S.  Micali.  Proving  properties  of  physically  hidden  data.  Lecture  given  at  International 
Computer  Science  Institute  (August);  Workshop  on  Cryptography  (September),  1989. 

[40]  S.  Micali.  Card  games  are  universal.  Lecture  given  at  University  of  Rome,  University 
of  Toronto,  January  1990. 

[41]  S.  Micali.  Interactive  proofs.  Lecture  given  at  American  Association  for  the  Advance¬ 
ment  of  Science,  February  1990. 

[42]  S.  Micali.  Probabilistic  proofs  and  their  applications.  Lecture  given  at  American 
Association  for  the  Advancement  of  Science  Meeting,  New  Orleans,  LA,  February  1990. 

[43]  S.  Micali.  Proofs,  knowledge  and  computation.  Lecture  given  at  Yale  University, 
February  1990. 

[44]  S.  Micali.  The  round  complexity  of  secure  protocols.  Lecture  given  at  Princeton 
University,  May  1990. 
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[45]  S.  Micali.  Verifiable  secret  sharing.  Lecture  given  at  MIT  Laboratory  for  Computer 
Science,  May  1990. 

[46]  S.  Micali.  Zero-knowledge  proofs  and  their  applications.  Lecture  given  at  Harvard 
University,  Yale  University,  April  1990. 

[47]  C.  Norton.  Using  separation  algorithms  in  fixed  dimension.  Lecture  given  at  First 
Annual  ACM-SIAM  Symposium  on  Discrete  Algorithms,  1990. 

[48]  L.  Levin  and  0.  Goldreich.  A  hard-core  predicate  for  all  one-way  functions.  Lecture 
given  at  Cryptography  Conference  at  Oberwolfach,  Germany,  September  1989. 

[49]  L.  Levin,  R.  Impagliazzo,  and  M.  Luby.  Pseudo-random  number  generation  from  any 
one-way  function.  Lecture  given  at  Minimal  Length  Encoding  Workshop,  Stanford, 
March  1990. 

[50]  R.  Ostrovsky.  Comparison  of  bit-commitment  and  oblivions  transfer  protc  -Is  when 
players  have  different  computing  power.  Lecture  given  at  DIMACS  Workshop,  Prince¬ 
ton,  NJ,  October  1989. 

[51]  R.  Ostrovsky.  Efficient  computation  on  oblivious  rams.  Lecture  given  at  MIT  Labora¬ 
tory  for  Computer  Science,  December  1989. 

[52]  R.  Ostrovsky.  On  protection  of  traffic-patterns  in  a  distributed  database.  Lecture  given 
at  DIMACS  workshop,  Princeton,  NJ,  October  1989. 

[53]  R.  Ostrovsky.  Perfect  zero-knowledge  in  constant  rounds.  Lecture  given  at  Harvard 
University,  November  1989. 

[54]  J.  Park.  The  Monge  array:  An  abstraction  and  its  applications.  Lecture  given  at 
the  Ohio  State  University,  Columbus,  OH  (February);  Sandia  National  Laboratories, 
Albuquerque,  NM  (March),  1990. 

[55]  J.  Park.  Selection  and  sorting  in  totally  monotone  arrays.  Lecture  given  at  First 
Annual  ACM-SIAM  Symposium  on  Discrete  Algorithms,  1990. 

[56]  C.  G.  Plaxton.  Deterministic  sorting  in  nearly  logarithmic  time  on  the  hypercube  and 
related  computers.  Lecture  given  at  Cornell  University  (October  1989),  Duke  University 
(February  1990),  Harvard  University  (March  1990). 

[57]  C.  G.  Plaxton.  On  the  network  complexity  of  selection.  Lecture  given  at  IBM  Almaden 
Research  Center,  January  1990. 

[58]  J.  G.  Riccke.  A  complete  and  decidable  proof  system  for  call-by- value  equalities. 
Lecture  given  at  University  of  Pennsylvania,  Philadelphia,  PA,  May  1990. 

[59]  R.  L.  Rivest.  Recent  developments  in  machine  learning  theory.  Lecture  given  at 
Bar-Ilan  University,  and  Foundations  of  Artificial  Intelligence  Conference,  June  1989. 
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LCS  25th  Anniversary  Symposium,  November  1989. 

[61]  R.  L.  Rivest.  Public-key  cryptography.  Lecture  given  at  SECURICOM  90  Conference 
Paris,  France,  March  1990. 

[62]  R.  L.  Rivest.  Two  results  in  machine  learning  theory:  on  learning  binary  relations 
and  on  improving  a  learning  algorithm.  Lecture  given  at  MIT,  ARO  Research  Review, 
March  1990. 

[63]  P.  Rogaway.  Cryptographically  secure  distributed  computation  in  a  constant  number 
of  rounds.  Lecture  given  at  DIMACS  workshop,  Princeton,  October  1989. 

[64]  J.  Rompel.  Efficient  NC  algorithms  for  set  cover  with  applications  to  learning  and 
geometry.  Lecture  given  at  30^  Annual  Symposium  on  Foundations  of  Computer 
Science,  Research  Triangle  Park,  NC,  October  1989. 

[65]  J  Rompel.  One-way  functions  are  necessary  and  sufficient  for  secure  signatures.  Lecture 
^iven  at  ?2'ld  Annual  ACM  Symposium  on  Theory  of  Computing,  Baltimore,  MD  (May 
1990);  University  of  California,  Berkeley  (November  1989). 

[66]  R.  E  Schapire.  The  strength  of  weak  learnability.  Lecture  given  at  Second  Annual 
Workshop  on  Computational  Learning  Theory,  30tH  Annual  Symposium  on  Foundations 
of  Computer  Science,  MIT  Laboratory  for  Computer  Science,  1989. 

[67]  R.  E.  Schapire.  Unsupervised  learning  of  deterministic  environments.  Lecture  given  at 
Harvard  University  Natural  Information  Processing  Seminar  series,  1989. 

[68]  L.  Schulman.  Optimal  randomized  algorithms  for  local  sorting  and  set-maxima.  Lecture 
given  at  the  22nd  Annual  ACM  Symposium  on  Theory  of  Computing,  May  1990. 

[69]  E.  J.  Schwabe.  On  the  computational  equivalence  of  hypercube-derived  networks. 
Lecture  given  at  Second  Annual  ACM  Symposium  on  Parallel  Algorithms  and  Archi¬ 
tectures,  July  1990. 

[70]  E.  J.  Schwabe.  On  the  computational  equivalence  of  hypercube- derived  networks. 
Lecture  given  at  Second  Annual  ACM  Symposium  on  Parallel  Algorithms  and  Archi¬ 
tectures,  July  1990. 

[71]  C.  Stein.  Leighton-Rao  might  be  practical:  faster  approximation  algorithms  for  con¬ 
current  flow  with  uniform  capacities.  Lecture  given  at  22nd  Annual  ACM  Symposium 
on  Theory  of  Computing,  May  1990. 

[72]  C.  Stein.  A  new  parallel  graph  decomposition  technique  with  applications  to  finding  a 
cycle  cover.  Lecture  given  at  Fifth  SIAM  Conference  on  Discrete  Mathematics,  June 
1990. 


195 


Theory  of  Computation 
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14.1  Introduction 

The  Theory  of  Distributed  Systems  research  group  continued  its  work  on  algorithms  and  im¬ 
possibility  results  for  distributed  problems,  as  well  as  its  work  on  modeling,  proof  techniques 
and  applications.  Particular  highlights  this  year  include  results  on  timing-based  computing 
and  the  development  of  the  Spectrum  system  for  simulating  distributed  algorithms. 

14.2  Faculty  Reports 

Nancy  A.  Lynch 

This  year,  Nancy  Lynch  worked  mainly  on  upper  and  lower  time  bound  results  for  timing- 
based  and  asynchronous  systems,  as  well  as  the  development  of  models  and  proof  techniques 
for  timing-based  systems.  First,  she  completed  work  with  Hagit  Attiya,  begun  last  year,  on 
upper  and  lower  bounds  for  the  mutual  exclusion  problem  in  a  timing-based  setting  [21]. 
Second,  again  working  with  Attiya,  she  developed  a  new  mapping  technique  for  proving 
correctness  and  timing  properties  of  timing- based  algorithms  [206].  Third,  she  worked  with 
Attiya,  Dwork,  and  Stockmeyer  to  prove  upper  and  lower  time  bounds  for  the  problem  of 
distributed  consensus  in  the  presence  of  processor  faults  [20].  Fourth,  she  worked  with  Attiya 
and  Shavit  to  prove  upper  and  lower  time  bounds  for  the  problem  of  wait-free  approximate 
agreement.  The  main  point  of  this  work  is  to  show  that  wait-free  algorithms  are  inherently 
more  time  consuming  than  non-wait-free  algorithms,  even  in  the  “normal”  case  where  no 
processors  fail  [22].  The  last  three  of  these  sets  of  results  are  all  discussed  in  the  report  of 
Attiya. 

Lynch  continued  her  work  on  the  theory  of  atomic  transactions.  One  major  accomplishment, 
done  jointly  with  Alan  Fekete  and  Bill  Weihl  [111],  is  a  new  result  showing  how  some  of 
the  standard  techniques  of  the  “classical”  theory  of  database  concurrency  control  can  be 
used  to  help  prove  the  stronger  “user-view”  notion  of  database  correctness  described  in 
[110].  Another  accomplishment  is  an  almost-completed  revision  of  work  with  Ken  Goldman 
on  modeling  replicated  data  algorithms  [122].  Also,  she  helped  revise  an  earlier  paper  on 
modeling  locking  algorithms  [109],  for  publication  in  a  special  conference  issue  of  JCSS. 

Lynch  continued  her  work  of  last  year  on  algorithms  and  impossibility  results  for  data  link 
behavior.  The  work  presented  in  [207]  on  the  impossibility  of  implementing  reliable  data 
link  behavior  in  the  presence  of  crashes  was  simplified  and  submitted  for  publication.  With 
Fekete,  Lynch  obtained  a  new  result  showing  the  impossibility  of  transmitting  any  data  with¬ 
out  message  headers  [108].  Work  in  progress  involves  combining  the  remaining  result  of  [207], 
showing  impossibility  of  reliable  data  link  message  transmission  in  the  presence  of  bounded 
headers,  with  a  contrasting  result  of  Hagit  Attiya,  Mike  Fischer,  Lenore  Zuck  and  Da- Wei 
Wang  in  which  such  message  delivery  is  achieved;  the  difference  is  that  the  impossibility  re¬ 
sult  requires  an  assumption,  violated  by  the  algorithm,  that  the  “best  case”  message  delivery 
time  be  bounded  by  a  constant.  An  interesting  sidelight  is  that  the  correctness  proof  of  this 
algorithm  (which  is  not  at  all  obvious)  has  been  verified  automatically  by  Tobias  Nipkow,  a 
visitor  from  Cambridge,  LWland,  using  tL>c  Isabelle  automatic  theorem-prover. 

Lynch  also  worked  with  Ken  Goldman  on  a  method  of  modeling  asynchronous  shared  memory 
algorithms  within  the  I/O  automaton  model  [122]. 
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Lynch  began  a  consulting  project  with  Digital  Equipment  Corporation  on  modeling,  specifi¬ 
cation  and  verification  for  timing-dependent  communication  protocols,  and  one  with  Draper 
Labs  on  fault-diagnosis  algorithms. 

Lynch’s  professional  service  activities  included  the  following: 

1.  Working  on  the  committee  to  select  the  ACM  thesis  award  winners. 

2.  Formulating  (with  Mike  Fischer)  a  set  of  recommendations  to  the  Computing  Research 
Board  for  the  formation  of  a  new  Committee  on  the  Status  of  Women  in  Computer 
Science. 

3.  Serving  on  this  year’s  panel  to  select  the  Presidential  Young  Investigators. 

4.  Conducting  a  review  of  the  Computer  Science  Department  at  the  University  of  Ten¬ 
nessee. 

5.  Planning  this  year’s  POCS  colloquium  series. 

With  Weihl,  Butler  Lampson,  and  John  Guttag,  Lynch  also  worked  on  developing  a  new 
course  on  “Principles  of  Computer  Systems.”  In  addition  to  supervising  her  own  students, 
she  also  served  as  thesis  reader  for  Bard  Bloom. 

14.3  Research  Associate  and  Student  Reports 

Hagit  Attiya 

Hagit  Attiya  continued  to  work  on  timing  properties  of  distributed  systems.  Together  with 
Nancy  Lynch,  Hagit  developed  a  new  technique  for  proving  timing  properties  for  timing- 
based  algorithms;  it  is  an  extension  of  the  mapping  techniques  previously  used  in  proofs 
of  safety  properties  for  asynchronous  concurrent  systems.  The  key  to  the  method  is  a  way 
of  representing  a  system  with  timing  constraints  as  an  automaton  whose  state  includes 
predictive  timing  information.  Timing  assumptions  and  timing  requirements  for  the  system 
are  both  represented  in  this  way.  A  multivalued  mapping  from  the  “assumptions  automaton” 
to  the  “requirements  automaton”  is  then  used  to  show  that  the  given  system  satisfies  the 
requirements.  The  technique  is  illustrated  with  two  simple  examples,  a  resource  manager 
and  a  signal  relay  system,  and  a  third,  more  complex  example  of  a  two-process  race  system. 
The  technique  is  shown  to  be  complete ,  that  is,  if  some  automaton  with  certain  timing 
assumptions  has  certain  timing  behavior,  than  there  exists  a  mapping  from  the  “assumptions 
automaton”  to  the  “requirements  automaton.”  (These  results  appear  in  [206].) 

Together  with  Cynthia  Dwork,  Nancy  Lynch  and  Larry  Stockmeyer,  Hagit  studied  upper  and 
lower  bounds  for  the  time  complexity  of  the  problem  of  reaching  agreement  in  a  distributed 
network,  in  the  presence  of  process  failures  and  uncertain  information  about  time  [20].  It  is 
assumed  that  the  amount  of  (real)  time  between  any  two  consecutive  steps  of  any  r.onfaulty 
process  is  at  least  ci  and  at  most  C2;  thus,  C  —  Ci/ci  is  a  measure  of  the  timing  uncertainty. 
It  is  also  assumed  that  the  time  for  message  delivery  is  at  most  d.  Processes  are  assumed 
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to  fail  by  stopping,  so  that  process  failures  can  be  detected  by  timeouts.  A  straightforward 
adaptation  of  a  (t  +  l)-round  synchronous  agreement  algorithm  takes  time  ( t  -f  1  )Cd  if 
there  are  t  faults,  while  a  straightforward  reduction  from  a  timing-based  algorithm  to  a 
synchronous  algorithm  yields  a  lower  bound  of  ( t  -f  1  )d.  The  main  result  is  an  agreement 
algorithm  in  which  the  uncertainty  factor  C  is  only  incurred  for  one  round,  yielding  a  running 
time  of  approximately  2td-\-Cd  in  the  worst  case.  A  second  result  shows  that  any  agreement 
algorithm  must  take  time  at  least,  approximately,  ( t  —  1  )d  +  Cd  in  the  worst  case. 

Hagit  also  continued  to  study  the  issue  of  fault- tolerance  in  various  asynchronous  distributed 
systems. 

Together  with  Yehuda  Afek,  Danny  Dolev,  Eli  Gafni,  Michael  Merritt  and  Nir  Shavit,  Hagit 
developed  a  wait-free  algorithm  for  obtaining  atomic  snapshots  of  shared  memory,  using  only 
bounded  amount  of  additional  memory  [2j.  An  atomic  snapshot  memory  is  a  shared  data 
structure  allowing  concurrent  processes  to  store  information  in  a  collection  of  shared  registers, 
all  of  which  may  be  read  in  a  single  atomic  scan  operation.  They  present  three  wait-free 
implementations  of  atomic  snapshot  memory.  Two  constructions  implement  wait-free  single¬ 
writer  atomic  snapshot  memory  from  wait-free  atomic  single-writer,  n-reader  registers.  A 
third  construction  implements  a  wait-free  n-writer  atomic  snapshot  memory  from  n-writer, 
n-reader  registers.  The  first  implementation  uses  unbounded  (integer)  fields  in  these  registers, 
while  the  other  implementations  use  only  bounded  registers.  All  operations  require  0(n2) 
reads  and  writes  to  the  component  shared  registers  in  the  worst  case. 

In  joint  work  with  Amotz  Bar-Noy  and  Danny  Dolev,  Hagit  developed  a  method  of  emulat¬ 
ing  wait-free  asynchronous  algorithms  that  communicate  via  shared-memory  in  two  different 
message-passing  systems  [19].  The  two  message- passing  models  considered  are  a  complete 
network  with  processor  failures  and  an  arbitrary  network  with  dynamic  link  failures.  The 
emulations  are  achieved  by  implementing  a  wait-free,  atomic,  single-writer  multi-reader  reg¬ 
ister  in  unreliable,  asynchronous  networks.  The  overhead  introduced  by  these  emulations 
is  polynomial  in  the  number  of  processors  in  the  systems.  Any  wait-free  algorithm  based 
on  atomic,  single-writer  multi-reader  registers  can  be  automatically  emulated  in  message¬ 
passing  systems.  Immediate  new  results  are  obtained  by  applying  the  emulators  to  known 
shared-memory  algorithms.  These  include,  among  others,  protocols  to  solve  the  following 
problems  in  the  message-passing  model  in  the  presence  of  processor  or  link  failures:  multi- 
writer  multi-reader  atomic  registers,  concurrent  time-stamp  systems,  f-exclusion,  atomic 
snapshots,  randomized  consensus,  and  implementation  of  a  class  of  data  structures. 

Together  with  Nancy  Lynch  and  Nir  Shavit,  Hagit  explored  the  time  complexity  of  wait- 
free  algorithms  for  approximate  agreement  in  “normal”  executions,  where  no  failures  occur 
and  processes  operate  at  approximately  the  same  speed.  A  lower  bound  of  logn  on  the 
time  complexity  of  any  wait-free  algorithm  that  achieves  approximate  agreement  among  n 
processes  is  proved.  In  contrast,  there  exists  a  (non- wait-free)  algorithm  that  solves  this 
problem  in  constant  time.  This  implies  an  D(log  n)  time  separation  between  the  wait-free  and 
non-wait-free  computation  models.  Two  fast  wait-free  approximate  agreement  algorithms  are 
presented,  a  constant  time  2-process  algorithm  and  an  O(logn)  time  n-process  algorithm; 
the  complexity  of  the  latter  algorithm  is  within  a  small  constant  of  the  lower  bound.  (These 
results  appear  in  [22].) 
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Cynthia  Dwork 

(On  Sabbatical  from  IBM  Almaden) 

Since  arriving  at  MIT  in  September  1989  Dwork  has: 

1.  Improved  lower  bounds  on  connectivity  necessary  for  perfectly  secure  message  trans¬ 
mission  in  a  general  network  in  certain  adversary  models. 

2.  Identified  and  studied  the  effects  of  cooperation  between  two  adversaries,  the  disruptor 
and  the  listener  on  secret  computation  and  secure  message  transmission. 

3.  Defined  a  stronger  notion  of  secrecy  in  the  presence  of  Byzantine  faults  than  the  one 
generally  studied  in  the  literature,  and  obtained  protocols  for  verifiable  secret  sharing 
and  secret  computation  at  no  increase  in  processors,  and  at  no  significant  increase  in 
computation  or  communication  costs. 

4.  Obtained  almost  tight  upper  and  lower  bounds  on  the  time  needed  to  reach  Byzantine 
agreement  in  the  presence  of  fail-stop  faults  in  a  model  of  distributed  computation 
in  which  there  are  known  upper  bounds  on  message  delivery  time  and  known  upper 
and  lower  bounds  on  process  step  time.  This  work,  joint  with  Attiya,  Lynch,  and 
Stockmeyer,  is  discussed  in  the  report  of  Attiya. 

5.  Served  as  Program  Chairperson  for  the  Ninth  Annual  ACM  Symposium  on  Principles 
of  Distributed  Computing. 

6.  Revised  four  articles  in  response  to  referee’s  comments.  Of  these,  “A  Time  Complexity 
Gap  for  2-way  Probabilistic  Finite  State  Automata,”  written  with  Stockmeyer,  has 
been  accepted  for  publication  in  SIAM  Journal  of  Computing ;  and  “Shifting  Gears: 
Changing  Algorithms  on  the  Fly  to  Expedite  Byzantine  Agreement,”  written  with 
Bar-Noy,  Dolev,  and  Strong,  has  been  accepted  for  publication  in  Information  and 
Computation. 

Alan  Fekete 

Alan  Fekete  (University  of  Sydney)  spent  six  weeks  in  July  and  August  1989  visiting  the  The¬ 
ory  of  Distributed  Systems  research  group.  His  research  concentrated  on  reasoning  about 
nested  transaction  systems  with  special  attention  given  to  the  following  topics:  verifying 
replication  management  algorithms  with  weaker  correctness  conditions  than  external  con¬ 
sistent  serializability,  understanding  the  assumptions  made  in  conventional  serializability 
theory,  and  optimistic  locking  techniques.  He  also  was  involved  in  research  on  the  possibility 
and  impossibility  of  communication  protocols  using  unreliable  communication  media. 

Ken  Goldman 

Kenneth  Goldman  is  a  Ph.D.  student  in  the  Theory  of  Distributed  Systems  research  group. 
His  thesis  presents  the  Spectrum  Simulation  System,  a  new  research  tool  for  the  design  and 
study  of  distributed  algorithms.  Based  on  the  formal  Input/Output  Automaton  model  of 
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Lynch  and  Tuttle,  this  research  tool  allows  one  to  express  distributed  algorithms  as  collec¬ 
tions  of  I/O  automata  and  simulate  them  directly  in  terms  of  the  semantics  of  that  model. 
This  permits  integration  of  algorithm  specification,  design,  debugging,  analysis,  and  proof 
of  correctness  within  a  single  formal  framework  that  is  natural  for  describing  distributed  al¬ 
gorithms.  The  research  tool  provides  a  language  for  expressing  algorithms  as  I/O  automata, 
a  simulator  for  generating  algorithm  executions,  and  a  graphics  interface  for  constructing 
systems  of  automata  and  observing  their  executions. 

Goldman  has  shown  that  the  properties  of  the  I/O  automaton  model  provide  a  solid  foun¬ 
dation  for  algorithm  development  tools.  For  example,  using  I/O  automaton  composition , 
Spectrum  users  may  define  composed  types  hierarchically,  study  simulations  at  varying  lev¬ 
els  of  detail,  and  create  specialized  debugging  and  analysis  devices.  These  devices,  called 
spectators,  are  written  in  the  Spectrum  language  just  as  any  other  system  component,  and 
can  monitor  algorithm  executions  for  correctness  and  performance  without  interfering  with 
the  algorithm.  Spectators  are  made  possible  only  by  the  nonblocking,  synchronous,  multi¬ 
party  communication  provided  in  the  Spectrum  system.  Also,  since  the  message  system  is 
modeled  explicitly  as  an  automaton,  users  may  study  algorithms  under  different  commu¬ 
nications  assumptions  simply  by  substituting  one  automaton  type  for  another.  Techniques 
used  to  prove  the  correctness  of  an  algorithm  can  also  be  used  as  debugging  tools.  For 
example,  the  system  checks  state  invariants  during  execution  and  allows  users  to  roll  back 
an  execution  to  discover  the  source  of  errors.  Several  researchers  have  successfully  used  the 
system  to  simulate  and  debug  algorithms  (for  examples,  see  [138][191]). 

Also  in  Goldman’s  thesis,  the  I/O  automaton  model  is  extended  for  both  shared  memory 
[122]  and  superposition  [121].  Possible  extensions  of  the  Spectrum  Simulation  System  for 
shared  memory  and  superposition  are  discussed.  In  addition,  an  algorithm  for  distributing 
the  simulation  system  is  presented  in  [120]. 

Upon  completing  his  degree,  Goldman  will  join  the  faculty  in  the  Computer  Science  Depart¬ 
ment  at  Washington  University  in  St.  Louis. 

John  Leo 

John  Leo  has  completed  his  Master’s  thesis  “Dynamic  Process  Creation  in  a  Static  Model.” 
The  thesis  shows  that  proofs  of  correctness  of  algorithms  involving  dynamic  process  creation 
and  changing  topologies  can  be  handled  rigorously  within  a  static  model,  in  particular,  I/O 
automata  [208].  It  is  also  shown  that  Actors  [9]  can  be  modeled  using  I/O  Automata. 
Additional  proof  techniques  are  developed  and  demonstrated. 

Steve  Ponzio 

In  addition  to  coursework  and  reading,  Stephen  continued  his  study  of  timing-based  algo¬ 
rithms.  He  .x tended  his  work  on  the  dining  philosophers  problem,  and  developed  a  model 
of  bounded-capacity  message  links  for  which  he  showed  a  lower  bound  on  the  time  required 
to  detect  stopping  failures. 

Isaac  Saias 

Isaac  Saias  joined  the  Theory  of  Distributed  Systems  group  in  September  1989.  Isaac  is  cur 
rently  studying  the  time  complexity  of  Rabin’s  probabilistic  algorithm  for  mutual  exclusion. 
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In  this  work  he  derives  upper  bounds  for  the  expected  time  of  a  round  in  presence  of  an 
adversarial  scheduler. 

Isaac  also  worked  with  Nancy  Lynch  and  Stuart  Adams  on  the  modeling  of  a  complex 
system  built  up  of  potentially  failing  processors.  They  investigate  different  scheduling  of 
repairs  when  provided  with  partial  information  about  the  system  in  order  to  maximize  the 
expected  life  time. 

Ken  Streeter 

Kenneth  Streeter  is  continuing  work  on  his  Master’s  thesis  “A  Partitioned  Computation 
Machine.”  His  thesis  is  being  supervised  by  both  Nancy  Lynch  and  Paul  Brown,  his  company 
advisor  in  the  VI-A  Program  with  General  Electric  Corporate  Research  and  Development. 
The  thesis  develops  a  specification  language  which  utilizes  a  pictoral  representation  language 
extending  Harel’s  statecharts  [140].  The  model  is  closely  related  to  I/O  automata  [209] 
and  could  be  used  as  a  visual  specification  language  for  I/O  automata.  A  formal  execution 
semantics  and  methods  of  additive  and  multiplicative  composition  of  partitioned  computation 
machines  are  developed  and  demonstrated. 

Greg  Troxel 

Greg  Troxel  worked  and  completed  his  Master’s  thesis: 

An  algorithm  for  detecting  and  recovering  from  deadlock  in  a  system  using  remote  proce¬ 
dure  calls  is  presented,  along  with  a  proof  of  correctness.  The  proof  uses  the  I/O  automata 
model  of  Lynch  and  Tuttle,  described  in  [209]  and  [208].  First,  correctness  conditions  for 
the  problem  are  given  in  terms  of  I/O  automata.  Next,  a  high  level  graph-theoretic  rep¬ 
resentation  of  the  algorithm  is  shown  to  be  correct.  Then  a  lower  level  formulation  of  the 
algorithm,  taking  into  account  its  distributed  nature,  is  shown  to  be  equivalent  to  the  higher 
level  representation,  and  thus  correct. 

In  giving  the  correctness  conditions,  we  introduce  client  automata ,  which  model  the  behavior 
of  the  user’s  program,  and  allow  almost  all  details  of  this  user  program  to  be  suppressed  at 
both  specification  and  proof  time. 

To  simplify  the  proof  of  the  high  level  version  of  the  algorithm,  safety  properties  are  proved 
with  a  simplified  version  of  the  algorithm.  Then,  the  algorithm  is  transformed  to  the  full 
version,  and  it  is  argued  that  the  safety  properties  hold  for  the  transformed  version. 

A  new  technique  that  can  be  used  either  for  expanding  the  number  of  algorithms  to  which 
a  proof  applies  or  for  simplifying  the  proof  that  a  lower  level  algorithm  solves  the  same 
problem  as  a  higher  level  one  is  presented.  This  is  effected  by  underspecifying  the  predicates 
used  in  the  preconditions  of  the  I/O  automata.  This  captures  the  concept  that  whether  or 
not  the  algorithm  takes  a  certain  action  under  an  intermediate  range  of  conditions,  it  does 
not  impact  its  correctness.  This  lack  of  specification  can  be  carried  through  to  the  low  level 
algorithm,  presumably  making  the  job  of  the  implementor  easier  or  allowing  more  efficient 
code,  since  then  at  times  arbitrary  choices  may  be  made.  It  can  also  be  used  to  make  showing 
that  the  low  level  algorithm  solves  the  same  problem  as  the  high  level  one  easier. 
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The  proof  of  the  liveness  properties  of  the  high  level  version  of  the  algorithm  makes  use  of  a 
metric  on  states  of  the  algorithm.  Rather  than  the  conventional  technique  of  claiming  that 
the  set  of  values  of  the  metric  is  well  founded,  we  show  that  every  subset  of  such  values 
occurring  in  a  particular  execution  of  the  algorithm  is  well  founded.  This  enables  us  to  allow 
the  user  program  to  request  an  arbitrary  but  finite  number  of  remote  procedure  calls  with 
arbitrary  arguments. 

We  then  present  the  low  level  version  of  the  algorithm,  along  with  specifications  for  the 
communications  network,  etc.  used  by  it.  A  proof  is  presented  showing  that  the  low  level 
version  of  the  algorithm  (together  with  the  network,  etc.)  is  equivalent  (from  the  point  of 
view  of  the  user’s  program)  to  the  high  level  version. 

Mark  Tuttle 

Mark  Tuttle  finished  his  Ph.D.  thesis  “Knowledge  and  Distributed  Computation”  in  Septem¬ 
ber  1989.  The  topic  of  the  thesis  is  the  role  of  formal  definitions  of  knowledge  in  the  design 
and  analysis  of  distributed  algorithms.  The  thesis  shows  how  reasoning  in  terms  of  standard 
definitions  of  knowledge  can  lead  to  fast  solutions  to  problems  like  consensus  and  the  dis¬ 
tributed  firing  squad  problems,  and  how  to  construct  new  definitions  of  knowledge  that  seem 
to  be  useful  in  cryptography  and  areas  where  bounds  on  processors’  computational  powers 
limit  what  they  can  know. 

George  Varghese 

George  Varghese  joined  TDS  when  he  entered  MIT  as  a  full  time  graduate  student  in  Febru¬ 
ary  1990.  He  worked  with  Nancy  Lynch,  and  Art  Harvey  and  Radia  Perlman  (at  DEC)  on 
the  models  and  proofs  of  various  transport  and  routing  layer  protocols.  Recently,  he  has 
been  working  on  a  randomized  version  of  the  two  Generals  problem  to  produce  lower  and 
upper  bounds  on  probabilistic  safety  for  different  adversarial  models. 

Undergraduate 
Aparna  Gupta 

Aparna  worked  with  Nancy  Lynch  on  her  undergraduate  thesis  this  past  term.  Three 
algorithms — the  Hirshberg- Sinclair  leader  election,  the  Peterson  leader  election,  and  the 
Dijkstra  shortest  paths  algorithm — were  described  formally  using  Lynch  and  Tuttle’s  I/O 
automaton  model.  These  descriptions  were  then  coded  into  Ken  Goldman’s  Spectrum  sim 
ulation  in  order  to  gain  an  intuitive  feel  for  the  working  of  the  algorithms.  The  final  thesis 
discussed  three  points.  The  first  one  is  the  ease  of  translating  the  algorithms  into  the  I/O 
automaton  model  and  in  the  Spectrum  programming  language,  and  what  is  gained  by  each 
description.  The  second  one  is  possible  changes  to  the  Spectrum  interface  which  would  en¬ 
hance  its  ease  of  use  and  utility.  And  the  final  one  is  recommendations  for  further  studies 
facilitated  by  both  methods  of  description. 
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15.1  Introduction 

The  MIT  X  Consortium  was  formed  in  January  1988  to  further  the  development  of  the 
X  Window  System.  The  major  goal  of  the  Consortium  is  to  promote  cooperation  within 
the  computer  industry  in  the  creation  of  standard  software  interfaces  at  all  layers  in  the  X 
Window  System  environment.  MIT’s  role  is  to  provide  the  vendor-neutral  architectural  and 
administrative  leadership  required  to  make  this  work.  The  Consortium  is  financially  self- 
supporting,  with  membership  open  to  any  organization.  At  present,  nearly  70  companies 
belong  to  the  Consortium,  as  well  as  several  universities.  These  members  represent  the  bulk 
of  the  US  computer  industry,  as  well  as  a  considerable  segment  of  the  international  industry. 

15.2  Release  4  of  the  X  Window  System 

One  of  the  primary  tasks  of  the  Consortium  staff  is  the  maintenance  and  evolution  of  a 
software  distribution  containing  sample  implementations  of  all  interfaces  defined  by  the 
Consortium,  as  well  as  numerous  applications  and  utilities.  In  January  1990,  Release  4 
of  this  distribution,  consisting  of  50  megabytes  of  source  code  and  documentation,  was  made 
available  to  the  world,  along  with  a  companion  collection  of  90  megabytes  of  code  and  docu¬ 
mentation  of  user-contributed  software.  The  distribution  is  available  using  anonymous  FTP 
from  numerous  Internet  sites,  and  on  magnetic  tape  from  the  MIT  Software  Center.  Some 
of  the  major  improvements  in  Release  4: 

Keith  Packard  and  Bob  Scheifler  implemented  a  significantly  faster  and  smaller  X  server; 
more  detail  is  provided  in  the  next  section. 

Full  font  support  was  added  for  the  X  Consortium  standard  X  Logical  Font  Description 
conventions,  and  a  fairly  rich  set  of  fonts  was  added  for  both  75  and  100  dots  per  inch  displays. 
Font  donations  came  from  Adobe  Systems,  Digital  Equipment  Corporation,  Bigelow  and 
Holmes,  Sun  Microsystems,  and  Sony  Corporation. 

A  major  revision  of  the  Xt  Intrinsics  has  consolidated  several  independent  extensions  under¬ 
taken  by  industry  groups  in  support  of  product-quality  toolkits  and  modern  graphical  user 
interfaces.  The  most  significant  addition  is  support  for  windowless  widgets  (called  “gad¬ 
gets”),  as  well  as  other  resource-based  non-windowed  objects  for  general  programming.  In 
addition,  varargs-style  interfaces,  better  caching  of  resources,  support  for  incremental  se¬ 
lections,  improved  error  reporting,  support  for  passive  device  grabs,  and  a  class  extension 
mechanism  were  added. 

Chris  Peterson  completely  redesigned  the  Text  widget  in  the  Athena  Widget  Set  using  the 
new  object  mechanisms  in  the  Xt  Intrinsics,  providing  a  clean  interface  between  the  text 
source  and  sink,  and  significantly  improving  the  internal  text  and  resource  management. 
New  functionality  was  added  to  provide  previously  missing  features,  most  notably  search 
and  replace.  Chris  reimplemented  the  Simple  Menu  widget  to  use  the  new  object  mech¬ 
anisms,  providing  a  much  simpler  programming  interface,  as  well  as  reducing  the  amount 
and  complexity  of  the  code.  Several  of  our  applications  have  been  converted  to  use  the  new 
menu  facilities.  Chris  also  completely  rewrote  the  reference  manual  for  the  Athena  Widgets, 
making  it  much  easier  to  use  and  making  it  reflect  the  true  interface  provided  by  the  toolkit. 
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Ralph  Swick  of  Project  Athena  and  Keith  Packard  incorporated  support  for  non- rectangular 
windows  into  the  Athena  Widget  Set  and  several  applications,  using  the  new  X  Consortium 
standard  SHAPE  extension.  Round  clocks  and  oval  buttons  are  two  pleasant  outcomes  of 
this  work. 

Jim  Fulton  rewrote  most  of  the  “twm”  window  manager  (the  program  that  people  use  to  ma¬ 
nipulate  windows  on  the  screen),  providing  X  with  the  first  public  user  interface  that  imple¬ 
mented  the  guidelines  described  in  the  X  Consortium  standard  Inter-client  Communication 
Conventions  Manual.  The  window  manager  was  also  revised  to  support  non-rectangular 
application  windows,  “tab”-style  title  bars,  and  non-rectangular  icons,  all  using  the  new 
SHAPE  extension. 

Keith  Packard  did  a  major  overhaul  of  the  “xdm”  display  manager  daemon,  cutting  the  num¬ 
ber  of  processes  used  in  half,  improving  the  robustness  and,  most  importantly,  implementing 
the  X  Consortium  standard  X  Display  Manager  Control  Protocol  (XDMCP).  XDMCP  is 
designed  to  make  X  terminals  as  easy  to  use  as  traditional  character  cell  terminals,  and  to 
provide  a  vendor-neutral  mechanism  for  reducing  the  administrative  overhead  involved  in 
managing  a  large  network  of  X  terminals  connected  to  central  compute  servers. 

Jim  Fulton  enhanced  the  “xterm”  terminal  emulator  to  support  8-bit  characters,  allowing 
the  full  ISO  Latin-1  character  set  to  be  used.  Chris  Peterson  similarly  enhanced  the  Athena 
Text  Widget.  Bob  Scheifler  added  simple  bilingual  keyboard  support  to  the  Xlib  and  Xt 
Intrinsics  libraries. 

Donna  Converse  substantially  reworked  the  user  interface  to  the  “xmh”  mail  handler  pro¬ 
gram.  Visually,  the  interface  is  simpler,  and  it  is  highly  configurable.  Functionally,  xmh  is 
now  more  powerful,  making  it  easier  for  us  to  process  many  electronic  mail  messages  each 
day.  The  new  xmh  makes  use  of  nearly  every  widget  in  the  Athena  Widget  Set  and  adheres 
to  the  Inter-client  Communication  Conventions. 

Chris  Peterson  made  the  “xman”  manual  page  browser  considerably  easier  to  use,  by  making 
use  of  the  new  menus,  adding  keyboard  accelerators,  and  providing  a  simple  search  facility. 

Donna  Converse  also  completely  reimplemented  the  interface  to  the  “xcalc”  calculator  pro¬ 
gram,  making  it  a  model  demonstration  of  the  power  of  application  default  resource  files  in 
allowing  end-user  customization. 

Keith  Packard  implemented  an  “xditview”  application  for  displaying  ditroff  DVI  files  on  an 
X  display. 

Bob  Scheifler  improved  the  “xwud”  image  display  program  to  work  with  various  visual  types 
and  with  standard  colormaps,  and  implemented  a  simple  grayscale  conversion  algorithm  for 
displaying  color  images  on  a  monochrome  screen. 

Jim  Fulton  integrated  client-side  support  for  System  V  Release  3.2.  This  allows  X  appli¬ 
cations  to  be  run  on  personal  computers  running  the  System  V  operating  system,  and  also 
allows  X  applications  to  be  run  on  Cray  supercomputers. 

Donna  Converse  and  Ralph  Swick  added  support  for  using  ANSI  C  function  prototypes,  and 
made  the  major  C  header  files  usable  from  C  +  +  as  well.  Donna  fixed  the  Xlib  implementa¬ 
tion  to  deal  gracefully  when  memory  allocation  fails. 
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15.3  X  Server  Optimizations 

Work  on  the  X  sample  server  over  the  past  year  has  focused  on  performance  issues.  Bob  Schei¬ 
fier  and  Keith  Packard  completed  work  on  new  data  structures  for  the  major  server  resources 
(principally  windows,  regions,  and  graphics  contexts),  resulting  in  one-half  to  two-thirds  re¬ 
duction  in  total  data  space  in  a  typical  running  server.  This  is  an  extremely  important  gain 
for  low-end  X  terminals  with  limited  memory. 

Keith  Packard  and  Bob  Scheifier,  along  with  Joel  McCormack  of  Digital  Equipment  Cor¬ 
poration,  collaborated  in  designing  and  implementing  new  algorithms  for  manipulating  the 
window  hierarchy.  Common  window  operations,  such  as  create,  map,  unmap,  move,  and 
resize  were  all  sped  up,  by  factors  of  between  two  and  twenty.  Data  structure  redesign  con¬ 
tributed  to  the  performance  increases  for  window  creation  and  destruction;  windows  were 
redesigned  so  that  instead  of  being  composed  of  many  small,  independently  allocated  pieces, 
a  single  allocation  could  be  used  to  allocate  the  entire  contents  of  the  most  common  form  of 
window. 

Keith  Packard  rewrote  most  of  the  device-dependent  code  for  8-bit  color  frame  buffers,  result¬ 
ing  in  dramatic  performance  improvements  (up  to  two  orders  of  magnitude,  in  some  cases) 
for  points,  lines,  filled  areas,  text,  area  copies,  and  scrolling.  Bob  Scheifier  reimplemented 
zero-width  arcs  and  filled  arcs,  using  integer  algorithms  that  run  up  to  500  times  faster 
than  the  previous  algorithms.  Keith  and  Bob  derived  an  efficient  exact  integer  algorithm 
for  scan-converting  wide  lines  with  correct  pixelization  (previous  algorithms  were  sometimes 
incorrect);  Keith’s  implementation  resulted  in  an  order  of  magnitude  speedup. 

One  of  the  performance  problems  not  solved  by  Release  4  was  efficiently  dealing  (in  both 
time  and  space)  with  all  combinations  of  16  logical  raster  operations,  in  conjunction  with 
arbitrary  plane  masks.  Either  many  copies  of  each  piece  of  code  must  be  compiled  (wasteful 
in  space)  or  a  runtime  switch  on  the  operation  must  be  made  for  each  pixel  (wasteful  in 
time).  Since  Release  4,  Keith  Packard  devised  and  implemented  a  raster  operation  reduction 
scheme,  which  converts  an  (operation,  plane  mask)  pair  into  the  following  boolean  equation: 
(dst  and  (src  and  al  lorxl)  and  (src  and  a2  xor  x 2)) 

where  al,  a2,  xl,  x2,  and  m  are  constants.  Although  this  equation  may  look  complex,  it  is 
substantially  faster  than  switching  on  the  operation  for  each  pixel.  This  equation  can  also 
be  reduced  further  in  many  common  cases.  For  example,  when  the  source  is  a  constant,  it 
reduces  to: 

dst  and  av  xor  xv 

The  general  equation  can  be  reduced  for  the  most  common  cases,  and  these  can  be  compiled 
individually.  For  modern  RISC  processors,  it  is  usually  possible  to  perform  several  register 
operations  between  memory  writes  without  slowing  down,  so  these  algorithms  can  often  still 
run  at  memory  speeds. 

15.4  Internationalization 

Internationalizing  a  program  means  making  it  adaptable  to  the  requirements  of  different 
native  languages,  local  customs,  and  coded  character  sets,  so  that  the  program  can  be  run  in 
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different  locales  without  source  code  modification  or  recompilation.  Typically,  the  program 
will  contain  some  collection  of  data  (e.g.,  various  strings)  which  need  to  change  for  each 
locale;  this  data  usually  must  be  external  to  the  executable,  so  that  it  can  be  tailored  to  and 
dynamically  selected  for  the  desired  locale. 

One  of  the  major  areas  of  work  within  the  X  Consortium  over  the  past  year  has  been 
internationalization.  This  work  has  focused  on  changes  to  Xlib  and  the  Xt  Intrinsics  to 
support  internationalization.  The  work  divides  into  four  main  topics:  keyboard  input,  text 
display,  text  interchange,  and  resource  files.  A  high  priority  for  the  work  is  to  keep  the 
interfaces  and  specifications  as  simple  as  possible,  so  that  they  can  be  understood  and  used 
by  the  general  programming  population,  not  just  multi-lingual  programmers. 

Although  many  people  believe  that  X  should  be  designed  to  support  true  multi-lingual 
environments  (supporting  multiple  languages,  mixed  into  a  single  textual  context),  it  is  also 
a  goal  of  our  work  to  harmonize  with  existing  formal  standards  for  internationalization.  The 
most  important  standard  at  this  time  is  the  ANSI  C  standard,  and  its  locale  mechanism. 
Unfortunately,  this  mechanism  is  quite  biased  towards  a  mono-lingual  environment,  with  a 
single,  global  locale  affecting  all  internationalized  operations. 

For  keyboard  input,  the  most  important  piece  of  functionality  is  supporting  input  methods, 
where  multiple  keystrokes  are  used  to  produce  a  single  character  or  string  of  characters.  An 
example  of  a  very  simple  input  method  is  the  use  of  diacritical  marks;  typically  a  user  will 
type  a  base  letter  and  then  a  diacritical  mark  (or  vice  versa)  to  produce  a  single  character. 
Much  more  complicated  input  methods  are  used  in  Asia.  For  example,  in  Japan  it  is  common 
to  type  in  Romaji  (composed  of  Latin  letters)  which  is  converted  on  the  fly  to  Kana  characters 
and  displayed;  once  a  complete  word  (or  phrase  or  sentence)  is  entered,  a  conversion  key 
is  pressed  and  conversion  to  Kanji  takes  place.  There  may  be  several  Kanji  words  for  a 
given  Kana  representation,  and  the  user  may  have  to  choose  from  a  list  of  alternatives. 
This  conversion  process  typically  requires  fairly  complex  linguistic  mechanisms,  and  large 
dictionaries. 

All  of  this  “pre-editing”  normally  should  be  hidden  entirely  from  the  application,  since  it  is 
rather  complex  and  locale-specific.  However,  this  desire  conflicts  with  another  desire — that 
of  providing  a  smooth  integration  of  pre-edit  with  normal  text  editing.  While  pre-edit  is  in 
progress,  there  are  two  basic  choices  for  where  to  display  the  pre-edit  text:  directly  inline 
with  the  normal  text,  called  “on  the  spot,”  with  things  like  justification  (in  a  WYSIWYG 
editor)  taking  place  on  the  fly,  and  “off  the  spot,”  with  the  pre-edit  text  displayed  somewhere 
below  or  to  the  side  of  the  main  text  window.  On  the  spot  pre-editing  is  generally  the  most 
desirable  for  end  users,  but  this  requires  tight  coupling  between  the  input  method  and  that 
part  of  the  application  which  is  concerned  with  the  actual  display  of  the  text.  Hence,  some 
support  for  pre-edit  must  be  provided  by  the  application  in  order  to  support  on  the  spot 
pre-edit,  although  we  do  not  want  to  require  all  applications  to  provide  such  support. 

There  are  two  basic  models  for  implementing  input  methods:  frontend  and  backend.  In  the 
frontend  model,  the  input  method  is  actually  a  separate  process  from  the  application.  While 
pre-edit  is  in  progress,  keystrokes  go  to  the  input  method  rather  than  to  the  application; 
once  a  complete  string  of  characters  has  been  converted,  the  resulting  string  is  then  sent  by 
the  input  method  to  the  application.  In  the  backend  method,  the  input  method  is  generally 
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linked  into  the  application  as  a  library  facility.  While  pre-edit  is  in  progress,  keystrokes 
continue  to  go  to  the  application,  which  passes  them  on  to  the  input  method  for  processing. 

The  frontend  model  has  a  number  of  advantages.  For  example,  it  is  easy  to  select  an  input 
method  at  runtime.  This  can  be  important  in  Asia,  because  there  are  generally  a  number 
of  different  input  styles  in  use  for  a  given  language,  even  within  a  single  organization.  It  is 
also  possible  to  share  a  single  input  method  process  among  all  applications  on  the  display. 
This  is  important,  since  the  input  method  typically  has  very  large  resident  dictionaries,  as 
well  as  user-specific  dictionaries  that  can  be  edited  dynamically  in  one  window  and  auto¬ 
matically  propagate  to  other  windows.  But  the  frontend  model  has  disadvantages  as  well. 
For  example,  it  is  rather  expensive  to  implement  on  the  spot  pre-editing  in  this  model,  since 
it  requires  inter-process  communication  with  the  application  on  every  keystroke.  There  are 
also  significant  event  synchronization  issues  that  are  rather  difficult  to  solve,  such  as  coor¬ 
dinating  changes  to  the  input  focus.  Both  frontend  and  backend  models  are  being  used  in 
Japan,  and  a  goal  of  our  keyboard  input  design  is  to  support  both  models,  transparent  to 
the  application. 

The  principal  issue  in  displaying  text  is  mapping  from  the  string  encoding  used  in  the  ap¬ 
plication  to  the  glyph  encoding  used  for  fonts.  These  encodings  easily  can  be  different, 
particularly  when  the  string  encoding  contains  a  mixture  of  single-byte  and  multi-byte  char¬ 
acters,  or  state-dependent  control  sequences.  The  goal  for  internationalized  text  display 
routines  is  to  insulate  the  application  from  the  complexities  of  this  mapping.  As  part  of  this 
mapping,  it  may  be  necessary  to  use  more  than  one  font  to  render  a  given  language.  For 
example,  the  set  of  characters  used  in  Taiwan  is  often  viewed  as  up  to  sixteen  “planes,”  each 
arranged  as  a  two  dimensional  array  of  characters,  each  plane  containing  several  thousand 
characters.  At  most,  two  planes  can  be  represented  with  a  single  X  font,  so  multiple  fonts 
must  be  used  to  display  a  full  set  of  such  Chinese  characters.  To  deal  with  this,  the  notion 
of  a  “font  group”  is  introduced,  which  internally  masks  the  details  of  the  number  of  fonts 
used  and  the  manner  in  which  they  are  used. 

In  addition  to  the  encoding  issue,  there  are  several  other  difficult  issues  for  text  display.  In 
some  instances,  it  may  be  necessary  to  use  multiple  glyphs  to  represent  a  single  character. 
A  simple  example  would  be  to  use  a  Latin  font  containing  only  base  letters  and  individual 
diacritical  marks;  a  given  character  would  then  be  displayed  by  using  the  glyph  for  a  base 
letter,  “overstruck”  by  the  appropriate  diacritical  mark.  In  some  languages  with  script 
letter  forms,  glyphs  representing  fragments  of  letters  may  be  used  to  compose  sequences  of 
characters  with  appropriate  tie  marks.  Another  problem  is  “nonlinear”  display,  in  which  the 
order  of  glyphs  on  the  screen  does  not  match  the  order  of  characters  in  a  string.  A  major 
example  is  text  with  both  right-to-left  and  left-to-right  text  sequences,  as  occurs  in  Arabic 
and  Hebrew.  Another  example  would  be  the  treatment  of  vowels  in  Hebrew  and  various 
Arabic  derivatives.  A  goal  of  the  internationalization  work  is  to  mask  these  details  from  the 
application.  However,  some  of  them  are  rather  difficult  issues,  and  it  remains  to  be  seen  how 
well  this  can  work. 

Text  interchange  is  a  somewhat  easier  problem  to  deal  with.  The  X  Consortium  standard 
Compound  Text  can  be  used  as  an  interchange  format  for  a  large  number  of  common  lan¬ 
guages.  However,  Compound  Text  only  provides  the  raw  character  information,  and  does 
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not  provide  any  indication  of  locale;  this  information  is  generally  needed  to  process  or  dis¬ 
play  the  information.  Unfortunately,  there  are  no  standards  for  locale  names  or  for  the 
locale  mechanism  defined  by  ANSI  C,  and  no  organized  registration  mechanism  in  place  to 
collect  such  locale  names  and  attempt  to  avoid  conflicting  use  of  names.  This  makes  true 
interchange  in  a  heterogeneous  network  environment  somewhat  problematic. 

When  an  X  program  has  been  internationalized,  it  will  generally  use  resource  files  to  obtain 
locale-specific  data,  such  as  text  labels  and  other  visual  cues.  Simple  mechanisms  are  required 
to  enable  the  programmer  to  select  a  resource  file  dynamically,  based  on  the  locale,  and 
resource  databases  must  support  locale-specific  string  encodings. 

15.5  Resource  Management 

As  we  gain  experience  with  sophisticated  toolkits  and  applications,  several  problems  with 
the  X  Resource  Manager  facilities  have  come  to  light.  Chris  Peterson  has  been  explor¬ 
ing  these  problems  and  developing  possible  solutions.  Four  problems  being  considered  are: 
easier  means  for  users  to  override  application  default  resources;  support  for  multiple- value  re¬ 
sources;  conditional  resource  matching  based  on  dynamic  attributes;  and  interactive  resource 
editing. 

The  current  set  of  matching  rules  does  not  take  into  account  the  fact  that  the  resource 
database  created  for  an  application  may  have  been  written  by  many  different  people.  The 
database  created  for  a  toolkit  application  is  loaded  from  several  sources,  with  parts  sup¬ 
plied  by  the  application  programmer  and  parts  supplied  by  the  end  user.  The  application 
programmer  often  wants  to  be  very  specific  with  the  resource  definitions  (e.g.,  a  particular 
button  should  have  a  border  width  of  3).  Users,  on  the  other  hand,  often  like  to  be  as  gen¬ 
eral  as  possible  (e.g.,  all  windows  should  be  green).  When  these  two  specifications  overlap 
the  application  default  setting  usually  dominates  since  the  more  specific  resource  string  will 
match. 

A  possible  solution  is  to  allow  the  resource  database  to  be  divided  into  sections.  In  essence, 
there  would  be  several  smaller  databases  that  are  searched  in  order,  and  if  a  match  is  found 
for  a  resource  in  an  earlier  section,  entries  in  later  sections  would  be  ignored.  Typically  there 
would  be  two  sections;  application  default  resources  would  normally  be  loaded  into  the  last 
section,  and  users  could  ensure  an  override  by  placing  resources  into  the  first  section. 

Another  class  of  problem  is  exemplified  by  the  translation  table  resources  in  the  Xt  Intrinsics. 
A  text  widget  typically  has  a  default  set  of  translations  built  in  to  the  widget.  The  application 
programmer  often  wants  to  modify  a  few  of  these  translations,  based  on  specific  use  of  the 
widget  within  the  application.  Finally,  the  end  user  typically  wants  to  make  a  few  more 
modifications,  to  rebind  commands  to  their  liking.  Unfortunately,  given  the  entire-value 
replacement  semantics  of  the  resource  manager,  the  application  programmer’s  modifications 
will  be  lost  when  the  end  user  tries  to  make  modifications;  the  only  way  around  this  is  for 
the  end  users  to  be  aware  of  the  application  modifications  and  to  incorporate  them  directly 
as  part  of  their  own  modifications.  This  is  very  undesirable,  because  later  changes  by  the 
application  programmer  will  not  propagate  automatically. 
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A  possible  solution  is  allow  a  resource  query  to  return  a  list  of  values,  rather  than  just  a 
single  value.  A  new  resource  specification  could  be  introduced  to  indicate  that  it  augments 
a  base  resource  specification,  rather  than  replacing  the  base.  A  new  lookup  function  could 
be  provided  to  return  the  base  value  plus  all  augmentations.  The  translation  table  parsing 
in  the  Intrinsics  could  then  be  modified  to  use  this  new  function,  and  use  the  resulting  list 
of  values  to  produce  the  desired  composite  translations. 

A  growing  problem  for  both  developers  and  users  is  the  inability  to  specify  that  a  given 
resource  should  be  used  only  on  a  specific  screen,  or  with  a  specific  visual.  For  example, 
if  a  server  supports  two  screens,  one  color  and  one  monochrome,  there  is  no  reasonable 
mechanism  for  users  to  choose  different  colors  depending  on  which  screen  an  application  is 
on.  For  developers,  there  is  no  way  to  make  such  choices  within  application  default  resource 
files,  so  applications  are  typically  configured  by  default  to  only  use  black  and  white,  even 
when  they  are  displayed  on  a  color  screen. 

A  possible  solution  is  to  add  a  preprocessor-style  language  to  the  resource  specification 
syntax,  one  that  allows  the  application  to  make  conditional  comparisons  to  check  attributes 
dynamically,  at  resource  lookup  time.  Simply  preprocessing  the  resource  file  when  it  is 
loaded  is  not  an  adequate  solution,  since  the  attributes  to  be  matched  may  vary  within  an 
application  as  different  objects  search  for  their  resources.  For  example,  a  given  application 
might  place  windows  on  more  than  one  sceen  of  a  given  server.  When  querying  a  resource 
from  the  database,  the  application  would  supply  a  set  of  attributes,  and  these  would  be 
used  to  evaluate  the  conditional  expressions  to  produce  a  (logically)  modified  instance  of  the 
database  to  resolve  the  query  against.  Typical  attributes  might  be  which  screen,  the  visual 
class  of  the  window,  the  number  of  bits  per  pixel  being  displayed,  the  particular  host  where 
the  application  is  executing,  and  the  particular  server  where  the  application  is  displaying. 

A  final  problem  that  many  users  experience  is  mapping  between  printed  documentation 
about  resources  for  an  application  and  the  actual  visual  pieces  of  the  application  affected  by 
those  resources.  Often,  it  would  be  easier  to  work  in  an  inverse  manner,  namely  pointing  at 
an  object  on  the  screen  and  asking  what  resources  are  associated  with  it.  Chris  Peterson  has 
been  prototyping  a  possible  extension  to  the  Xt  Intrinsics  to  support  this,  in  combination 
with  a  resource  editor  program.  The  user  can  click  on  any  application  window,  and  the 
editor  will  graphically  display  a  tree  showing  the  widgets  used  by  the  application.  The  user 
can  pan  over  this  tree,  select  individual  widgets,  and  dynamically  modify  their  resources. 
The  user  can  also  select  a  node  in  the  tree,  and  have  the  corresponding  widget  in  the 
actual  application  highlighted.  At  present,  to  learn  the  complete  set  of  resources  supported 
by  a  widget,  a  companion  application  written  by  Jim  Fulton  can  be  used.  This  program 
graphically  displays  a  tree  showing  the  class  hierarchy  of  the  Athena  Widgets.  Individual 
classes  can  be  selected,  and  their  resources  can  be  displayed.  Eventually,  the  resource  editor 
will  be  enhanced  to  query  the  application  directly  for  the  set  of  supported  resources. 

15.6  User  Interface  Monitoring 

Good  graphical  user  interfaces  are  difficult  and  time  consuming  to  create.  An  interactive 
design  process  is  often  desirable,  involving  the  construction  and  evaluation  of  prototypes. 
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Evaluation  requires  an  effective  means  of  recording  and  analyzing  interactions  between  the 
user  and  the  application. 

Jolly  Chen  examined  how  monitoring  mechanisms  can  be  added  to  a  user  interface  archi¬ 
tecture  to  provide  intrinsic  support  for  recording  the  human-computer  dialogue.  Previous 
approaches  to  monitoring  have  generally  captured  information  either  at  a  very  high  level, 
requiring  modification  of  the  application,  or  at  a  very  low  level,  recording  keystrokes  and 
other  raw  device  events.  Both  of  these  levels  have  significant  drawbacks,  either  requiring 
significant  modification  to  the  application  or  requiring  an  extremely  detailed  knowledge  of 
the  application  in  order  to  map  low  level  events  into  meaningful  semantic  actions. 

The  key  to  intrinsic  monitoring  is  inspection  of  the  communication  channels  between  the 
application  objects  and  the  interaction  objects  within  a  program.  The  information  commu¬ 
nicated  at  this  level  does  not  capture  the  full  semantic  intent  of  the  user,  but  it  does  offer  a 
higher  level  view  of  the  user’s  actions  than  that  offered  by  recording  raw  device  events.  For 
example,  instead  of  recording  only  that  a  mouse  button  was  pressed  at  location  (100,  300), 
it  is  possible  to  record  that  the  second  item  on  a  particular  menu  was  selected.  If  the  user 
interface  architecture  satisfies  a  few  straightforward  requirements,  intrinsic  monitoring  can 
be  easily  added,  without  requiring  support  from  the  application  programmer. 

Jolly  implemented  such  a  monitoring  mechanism  with  a  few  small  changes  to  the  Xt  In- 
trinsics.  The  Xt  Intrinsics  architecture  provides  two  main  communication  channels  between 
the  application  and  interaction  objects:  callbacks  and  actions.  Both  of  these  channels  are 
easy  to  monitor.  Actions  can  also  be  used  for  communication  within  and  between  inter¬ 
action  objects,  but  this  level  of  detail  is  often  of  interest  to  the  user  interface  evaluator  as 
well,  particularly  when  evaluating  the  design  of  new  interaction  objects.  Using  the  standard 
object  naming  conventions  of  the  Intrinsics,  resources  can  be  defined  to  enable  and  disable 
recording  of  individual  callbacks  and  actions,  or  groups  of  them. 

The  monitoring  mechanism  is  unobtrusive  with  respect  to  the  user  interface,  and  does  not 
appear  to  impose  a  significant  performance  penalty.  Jolly  also  implemented  an  analysis  tool 
to  process  the  resulting  data.  The  tool  is  essentially  a  database  application  with  a  graphical 
user  interface,  designed  specifically  for  storing  and  querying  the  monitored  incidents.  The 
tool  provides  various  means  for  filtering,  sorting,  and  summing  the  incidents.  For  example, 
one  can  query  for  all  incidents  that  occurred  in  a  Text  widget  and  had  a  duration  of  greater 
than  half  a  second,  or  count  the  number  of  sequences  of  three  incidents  that  begin  with  a 
Help  action  and  end  with  an  Abort  action. 

15.7  Test  Suite 

The  X  Testing  Consortium  was  a  loosely  bound  group  of  approximately  one  dozen  companies, 
working  together  on  comprehensive  test,  software  for  the  X  protocol  and  the  Xlib  C  language 
interface  to  it.  The  Testing  Consortium  produced  an  Alpha  Release  of  the  test  suite  in 
August  1989,  its  last  official  act.  It  is  generally  agreed  that  the  test  suite  is  still  a  long 
way  from  being  complete,  and  requires  a  fairly  careful  review  before  substantial  new  work  is 
performed.  Bob  Scheifler  put  together  a  Request  For  Proposals  for  further  development  of 
the  test  suite.  Nine  bids  were  received,  and  they  were  reviewed  by  a  multi-vendor  committee 
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headed  by  Bob  Scheifler.  A  single  firm  has  been  chosen  as  subcontractor  on  the  work,  which 
is  expected  to  take  two  years  to  complete,  although  several  releases  of  the  suite  will  be 
produced  in  the  interim  for  evaluation  by  the  X  Consortium.  A  major  goal  of  the  work  is  to 
make  the  suite  usable  for  regression  testing,  for  validation  of  systems  claimed  to  conform  to 
Federal  Information  Processing  Standard  (FIPS)  158  on  the  X  Window  System,  and  for  use 
by  industry  organizations  in  branding  systems  as  compliant  with  their  standards. 

15.8  X  Conference 

In  January  1990,  we  hosted  the  Fourth  Annual  X  Technical  Conference.  The  purpose  of 
the  conference  is  to  present  and  discuss  leading  edge  research  and  development  in  the  X 
environment  from  both  academia  and  industry.  Having  outgrown  MIT  facilities,  this  year 
the  conference  was  held  at  the  Boston  Marriott  Copley  Place.  The  conference  consisted  of 
seven  tutorials,  26  talks,  and  23  informal  “birds  of  a  feather”  sessions,  spread  over  three 
days.  Major  themes  were  object-oriented  toolkits  and  user  interface  management  systems, 
X  server  performance  issues,  multithreaded  clients  and  servers,  input  synthesis  for  regression 
testing,  and  novel  window  managers.  Bob  Scheifler  and  Donna  Converse  handled  most  of 
the  details  for  the  technical  program.  Donna  Converse  and  Michelle  Leger  handled  the  bulk 
of  the  organizational  details,  including  scheduling,  catering,  conference  proceedings,  and 
video  tape  coordination.  MIT  Conference  Services  handled  registration.  The  conference  was 
attended  by  approximately  1200  people,  and  was  very  well  received. 
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